A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

PDF / 1,418,991 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
48 Downloads / 234 Views

A Meta‑Analysis of Machine Learning‑Based Science Assessments: Factors Impacting Machine‑Human Score Agreements Xiaoming Zhai1 · Lehong Shi2 · Ross H. Nehm3 Accepted: 12 October 2020 © Springer Nature B.V. 2020

Abstract Machine learning (ML) has been increasingly employed in science assessment to facilitate automatic scoring efforts, although with varying degrees of success (i.e., magnitudes of machine-human score agreements [MHAs]). Little work has empirically examined the factors that impact MHA disparities in this growing field, thus constraining the improvement of machine scoring capacity and its wide applications in science education. We performed a meta-analysis of 110 studies of MHAs in order to identify the factors most strongly contributing to scoring success (i.e., high Cohen’s kappa [κ]). We empirically examined six factors proposed as contributors to MHA magnitudes: algorithm, subject domain, assessment format, construct, school level, and machine supervision type. Our analyses of 110 MHAs revealed substantial heterogeneity in 𝜅(mean = .64; range = .09-.97 , taking weights into consideration). Using three-level random-effects modeling, MHA score heterogeneity was explained by the variability both within publications (i.e., the assessment task level: 82.6%) and between publications (i.e., the individual study level: 16.7%). Our results also suggest that all six factors have significant moderator effects on scoring success magnitudes. Among these, algorithm and subject domain had significantly larger effects than the other factors, suggesting that technical features and assessment external features might be primary targets for improving MHAs and ML-based science assessments. Keywords Machine learning · Science assessment · Meta-analysis · Interrater reliability · Validity · Cohen’s kappa · Artificial Intelligence

Introduction Machine learning (ML) ranks among the technological tools most dramatically transforming science assessment (Zhai et al. 2020a). ML lies at the intersection of computer science, statistics, data science, and artificial intelligence and is grounded in the expectation that machines are able to learn from prior experiences rather than merely execute pre-established commands (Samuel 1959). This perspective has revolutionized many fields of research and is significantly impacting many areas of science, technology, and society (Hutter et al. 2019; Mohri et al. 2018). Although * Xiaoming Zhai [email protected]; [email protected] 1

Department of Mathematics and Science Education, University of Georgia, Athens 30602, Georgia, US

2

Department of Career and Information Studies, University of Georgia, Athens 30605, Georgia, US

3

Department of Ecology and Evolution, Stony Brook University, Stony Brook 11794, New York, US

the diversity of ML applications are broad, two basic questions drive much of this work (Jordan and Mitchell 2015): (1) How can computer systems be built to automatically improve in response to experience? And (2) what are the fundamental statistical-comput

Data Loading...

A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements

Recommend Documents

World Market Price of Oil Impacting Factors and Forecasting

ICERs for diagnostics and factors impacting NICE decisions

Success Factors Impacting Nowadays Technologically Driven Medical Devices

A score assignment method for factors in mineral prospectivity modeling

Agreements

A STEEP framework analysis of the key factors impacting the use of blockchain technology in the insurance industry

Packaging Agreements

Validity Evidence in Science Achievement Assessments Found in a Sample of Published Research Articles on Science Teachin

Agreements as the Grease (Not the Glue) of Society: A Cognitive and Social Science Perspective

Stabilisation Agreements

Exclusionary Agreements

Assessments