Sauvayre R. (2022), "Types of Errors Hiding in Google Scholar Data”, Journal of Medical Internet Research, Vol. 24, n°5, e28354. DOI: 10.2196/28354


Google Scholar (GS) is a free tool that may be used by researchers to analyze citations, to find appropriate literature or to evaluate the quality of an author or a contender for tenure, promotion, a faculty position, funding or research grants. GS has become a major bibliographic and citation database. Following the literature, databases such as PubMed, PsycINFO, Scopus or Web of Science can be used in place of GS because they are more reliable. The aim of this study is to examine the accuracy of citation data collected from GS and provide a comprehensive description of the errors and miscounts identified. For this purpose, 281 documents that cited two specific works were retrieved via the Publish or Perish (PoP) software and examined. This work studied the false positive issue inherent in the analysis of neuroimaging data. The results reveal an unprecedented error rate: 279 of 281 the examined references (99.3%) contain at least one error. The nonacademic documents tend to contain more errors than the academic publications (U=5117.0, P<.001). This viewpoint article, based on a case study examining GS data accuracy, shows that GS data not only fail to be accurate but also potentially expose researchers who would use these data without verification to substantial biases in their analyses and results. Further work must be conducted to access the consequences of using GS data extracted by PoP. 
Keywords: Reference accuracy; database reliability; false positive; academic publication; research evaluation; scientometrics; citation analysis.