Assessment by Comparative Judgement: An Application to Secondary Statistics and English in New Zealand

  • PDF / 936,033 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 69 Downloads / 151 Views

DOWNLOAD

REPORT


Assessment by Comparative Judgement: An Application to Secondary Statistics and English in New Zealand Neil Marshall1 · Kirsten Shaw1 · Jodie Hunter2 · Ian Jones3 Received: 4 November 2019 / Accepted: 31 March 2020 / Published online: 8 April 2020 © The Author(s) 2020

Abstract There is growing interest in using comparative judgement to assess student work as an alternative to traditional marking. Comparative judgement requires no rubrics and is instead grounded in experts making pairwise judgements about the relative ‘quality’ of students’ work according to a high level criterion. The resulting decision data are fitted to a statistical model to produce a score for each student. Cited benefits of comparative judgement over traditional methods include increased reliability, validity and efficiency of assessment processes. We investigated whether such claims apply to summative statistics and English assessments in New Zealand. Experts comparatively judged students’ responses to two national assessment tasks, and the reliability and validity of the outcomes were explored using standard techniques. We present evidence that the comparative judgement process efficiently produced reliable and valid assessment outcomes. We consider the limitations of the study, and make suggestions for further research and potential applications. Keywords  Assessment · English · Statistics · Comparative judgement

Introduction When a student’s assessment is assigned a grade we want that grade to be entirely dependent on the quality of the student’s work and not at all dependent on the biases and idiosyncrasies of whoever happened to assess it. Where we succeed we can say that the assessment is reliable, and where we fail we can say the assessment is unreliable (Berkowitz et  al. 2000). Reliability in high-stakes educational assessment is

* Ian Jones [email protected] 1

New Zealand Qualifications Authority, Wellington, New Zealand

2

Institute of Education, Massey University, Wellington, New Zealand

3

Mathematics Education Centre, Loughborough University, Loughborough, UK



13

Vol.:(0123456789)

50

New Zealand Journal of Educational Studies (2020) 55:49–71

particularly important where students’ grades affect future educational and employment prospects. One way to ensure reliability is to use so-called objective tests in which answers are unambiguously right or wrong (Meadows and Billington 2005). Examples are an arithmetic test in mathematics, a spelling test in languages, or a multiple-choice test in any subject. Objective tests also have the advantage of being quick and inexpensive to score and grade because the task can be automated (Sangwin 2013) and in practice often is (Alomran and Chia 2018). For these reasons—reliability and efficiency—objective tests are common in education systems around the world (Black et al. 2012). However, objective tests are far from universal because they risk delivering reliability at the cost of validity (Assessment Research Group 2009; Wiliam 2001). Educationalists have argued that there is mor