Sample size in bibliometric analysis
- PDF / 3,347,208 Bytes
- 18 Pages / 439.37 x 666.142 pts Page_size
- 105 Downloads / 235 Views
Sample size in bibliometric analysis Gordon Rogers1 · Martin Szomszor1 · Jonathan Adams1,2 Received: 11 May 2020 © The Author(s) 2020
Abstract While bibliometric analysis is normally able to rely on complete publication sets this is not universally the case. For example, Australia (in ERA) and the UK (in the RAE/REF) use institutional research assessment that may rely on small or fractional parts of researcher output. Using the Category Normalised Citation Impact (CNCI) for the publications of ten universities with similar output (21,000–28,000 articles and reviews) indexed in the Web of Science for 2014–2018, we explore the extent to which a ‘sample’ of institutional data can accurately represent the averages and/or the correct relative status of the population CNCIs. Starting with full institutional data, we find a high variance in average CNCI across 10,000 institutional samples of fewer than 200 papers, which we suggest may be an analytical minimum although smaller samples may be acceptable for qualitative review. When considering the ‘top’ CNCI paper in researcher sets represented by DAIS-ID clusters, we find that samples of 1000 papers provide a good guide to relative (but not absolute) institutional citation performance, which is driven by the abundance of high performing individuals. However, such samples may be perturbed by scarce ‘highly cited’ papers in smaller or less research-intensive units. We draw attention to the significance of this for assessment processes and the further evidence that university rankings are innately unstable and generally unreliable. Keywords Bibliometric sampling · CNCI · Citation impact · Research assessment · University ranking
Introduction What is the minimum number of observations required to make an acceptably precise estimate of the true mean citation impact or describe the relative means of a number of datasets? Sampling to estimate the population mean is a widespread problem in many research areas (e.g. Adams, 1980), but it is less commonly an issue when estimating citation impact in bibliometrics because it is often possible to make use of complete * Jonathan Adams [email protected] 1
Institute for Scientific Information, Clarivate Analytics, 160 Blackfriars Road, London SE1 8EZ, UK
2
The Policy Institute, King’s College London, 22 Kingsway, London WC2B 6LE, UK
13
Vol.:(0123456789)
Scientometrics
data, i.e. the full publication set for one or more entities. Of course, we make the caveat that this is a complete dataset only insofar as it is complete for a particular source such as the Web of Science. Other but unrecorded publications usually exist. Circumstances may arise in research assessment where the analysis of all available publication data will not or cannot be the case, in which event some light needs to be shed on sampling acceptability. We have sought to explore this because it is a challenge posed to us by many users of bibliometric analysis. It is a truism that larger samples reduce the variance of the mean but at what sample size
Data Loading...