Statistical stopping criteria for automated screening in systematic reviews

PDF / 2,205,083 Bytes
14 Pages / 595 x 791 pts Page_size
73 Downloads / 240 Views

(2020) 9:273

METHODOLOGY

Open Access

Statistical stopping criteria for automated screening in systematic reviews Max W Callaghan1,2,3*

and Finn Müller-Hansen1,3

Abstract Active learning for systematic review screening promises to reduce the human effort required to identify relevant documents for a systematic review. Machines and humans work together, with humans providing training data, and the machine optimising the documents that the humans screen. This enables the identification of all relevant documents after viewing only a fraction of the total documents. However, current approaches lack robust stopping criteria, so that reviewers do not know when they have seen all or a certain proportion of relevant documents. This means that such systems are hard to implement in live reviews. This paper introduces a workflow with flexible statistical stopping criteria, which offer real work reductions on the basis of rejecting a hypothesis of having missed a given recall target with a given level of confidence. The stopping criteria are shown on test datasets to achieve a reliable level of recall, while still providing work reductions of on average 17%. Other methods proposed previously are shown to provide inconsistent recall and work reductions across datasets. Keywords: Systematic review, Machine learning, Active learning, Stopping criteria

Background Evidence synthesis technology is a rapidly emerging field that promises to change the practice of evidence synthesis work [1]. Interventions have been proposed at various points in order to reduce the human effort required to produce systematic reviews and other forms of evidence synthesis. A major strand of the literature works on screening: the identification of relevant documents in a set of documents whose relevance is uncertain [2]. This is a time-consuming and repetitive task, and in a research environment with constrained resources and increasing amounts of literature, this may limit the scope of the evidence synthesis projects undertaken. Several papers have developed active learning (AL) approaches [3–7] to reduce the time required to screen documents. This paper sets out how current approaches are unreliable in practice, *Correspondence: [email protected] Mercator Research Institute on Global Commons and Climate Change, EUREF Campus 19, Torgauer Straße 12-15, 10829 Berlin, Germany 2 Priestley International Centre for Climate, University of Leeds, Leeds, LS2 9JT, UK Full list of author information is available at the end of the article 1

and outlines and evaluates modifications that would make AL systems ready for live reviews. Active learning is an iterative process where documents screened by humans are used to train a machine learning model to predict the relevance of unseen papers [8]. The algorithm chooses which studies will next be screened by humans, often those which are likely to be relevant or about which the model is uncertain, in order to generate more labels to feed back to the machine. By prioritising those studies most likely to be rele

Data Loading...

Statistical stopping criteria for automated screening in systematic reviews

Recommend Documents

A Study on a Stopping Strategy for Systematic Reviews Based on a Distributed Effort Approach

Systematic Reviews in Educational Research Methodology, Perspectives

Reviews Systematic and Meta-analysis

Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews

Systematic reviews: vaccination during pregnancy

Systematic Reviews: Characteristics and Impact

Information Criteria and Statistical Modeling

Correction to: Optimality criteria for futility stopping boundaries for group sequential designs with a continuous endpo

Systematic Reviews, Meta-Analyses, and Cost-Effective Analyses on Breast MRI Screening of High-Risk Women

Malaria Screener: a smartphone application for automated malaria screening

Self-adaptive potential-based stopping criteria for Particle Swarm Optimization with forced moves

A comparison of automatic Boolean query formulation for systematic reviews