A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements
- PDF / 1,418,991 Bytes
- 19 Pages / 595.276 x 790.866 pts Page_size
- 48 Downloads / 141 Views
A Meta‑Analysis of Machine Learning‑Based Science Assessments: Factors Impacting Machine‑Human Score Agreements Xiaoming Zhai1 · Lehong Shi2 · Ross H. Nehm3 Accepted: 12 October 2020 © Springer Nature B.V. 2020
Abstract Machine learning (ML) has been increasingly employed in science assessment to facilitate automatic scoring efforts, although with varying degrees of success (i.e., magnitudes of machine-human score agreements [MHAs]). Little work has empirically examined the factors that impact MHA disparities in this growing field, thus constraining the improvement of machine scoring capacity and its wide applications in science education. We performed a meta-analysis of 110 studies of MHAs in order to identify the factors most strongly contributing to scoring success (i.e., high Cohen’s kappa [κ]). We empirically examined six factors proposed as contributors to MHA magnitudes: algorithm, subject domain, assessment format, construct, school level, and machine supervision type. Our analyses of 110 MHAs revealed substantial heterogeneity in 𝜅(mean = .64; range = .09-.97 , taking weights into consideration). Using three-level random-effects modeling, MHA score heterogeneity was explained by the variability both within publications (i.e., the assessment task level: 82.6%) and between publications (i.e., the individual study level: 16.7%). Our results also suggest that all six factors have significant moderator effects on scoring success magnitudes. Among these, algorithm and subject domain had significantly larger effects than the other factors, suggesting that technical features and assessment external features might be primary targets for improving MHAs and ML-based science assessments. Keywords Machine learning · Science assessment · Meta-analysis · Interrater reliability · Validity · Cohen’s kappa · Artificial Intelligence
Introduction Machine learning (ML) ranks among the technological tools most dramatically transforming science assessment (Zhai et al. 2020a). ML lies at the intersection of computer science, statistics, data science, and artificial intelligence and is grounded in the expectation that machines are able to learn from prior experiences rather than merely execute pre-established commands (Samuel 1959). This perspective has revolutionized many fields of research and is significantly impacting many areas of science, technology, and society (Hutter et al. 2019; Mohri et al. 2018). Although * Xiaoming Zhai [email protected]; [email protected] 1
Department of Mathematics and Science Education, University of Georgia, Athens 30602, Georgia, US
2
Department of Career and Information Studies, University of Georgia, Athens 30605, Georgia, US
3
Department of Ecology and Evolution, Stony Brook University, Stony Brook 11794, New York, US
the diversity of ML applications are broad, two basic questions drive much of this work (Jordan and Mitchell 2015): (1) How can computer systems be built to automatically improve in response to experience? And (2) what are the fundamental statistical-comput
Data Loading...