A measure of interrater absolute agreement for ordinal categorical data
- PDF / 366,195 Bytes
- 19 Pages / 439.37 x 666.142 pts Page_size
- 85 Downloads / 284 Views
(0123456789().,-volV)(0123456789().,-volV)
ORIGINAL PAPER
A measure of interrater absolute agreement for ordinal categorical data Giuseppe Bove2
•
Pier Luigi Conti1 • Daniela Marella2
Accepted: 8 November 2020 Ó Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract A measure of interrater absolute agreement for ordinal scales is proposed capitalizing on the dispersion index for ordinal variables proposed by Giuseppe Leti. The procedure allows to overcome the limits affecting traditional measures of interrater agreement in different fields of application. An unbiased estimator of the proposed measure is introduced and its sampling properties are investigated. In order to construct confidence intervals for interrater absolute agreement both asymptotic results and bootstrapping methods are used and their performance is evaluated. Simulated data are employed to demonstrate the accuracy and practical utility of the new procedure for assessing agreement. Finally, an application to a real case is provided. Keywords Ordinal data Interrater agreement Resampling
1 Introduction Ordinal rating scales are frequently developed in study designs where several raters (or judges) evaluate a group of targets. For instance, in language studies new rating scales before their routine application are tested out by a group of raters, who assess the language proficiency of a corpus of argumentative (written or oral) texts & Giuseppe Bove [email protected] Pier Luigi Conti [email protected] Daniela Marella [email protected] 1
Dipartimento di Scienze Statistiche, Universita` ‘La Sapienza’’, P.le A. Moro, 5, 00185 Rome, Italy
2
Dipartimento di Scienze della Formazione, Universita` ‘‘Roma Tre’’, via del Castro Pretorio 20, 00185 Roma, Italy
123
G. Bove et al.
produced by a group of writers. Similar situations can be found in organizational, educational, biomedical, social, and behavioural research areas, where raters can be counsellors, teachers, clinicians, evaluators, or consumers and targets can be organization members, students, patients, subjects, or objects. When each rater evaluates each target, the raters provide comparable categorizations of the targets. The more the raters categorizations coincide, the more the rating scale can be used with confidence without worrying about which raters produced those categorizations. Hence, the main interest here consists in analysing the extent that raters assign the same (or very similar) values on the rating scale (interrater absolute agreement), that is to establish to what extent raters evaluations are close to an equality relationship (e.g., in the case of only two raters, if the two sets of ratings are represented by x and y the relation of interest is x ¼ y). Measures of interrater absolute agreement, as Cohen’s Kappa [and extensions to take into account three or more raters, e.g., von Eye and Mun (2005)] and intraclass correlations (ICC) [(Shrout and Fleiss 1979; McGraw and Wong 1996)] are usually applied when dealing with rating p
Data Loading...