Beyond kappa: an informational index for diagnostic agreement in dichotomous and multivalue ordered-categorical ratings
- PDF / 1,403,385 Bytes
- 11 Pages / 595.224 x 790.955 pts Page_size
- 59 Downloads / 158 Views
ORIGINAL ARTICLE
Beyond kappa: an informational index for diagnostic agreement in dichotomous and multivalue ordered-categorical ratings Alberto Casagrande1 · Francesco Fabris1
· Rossano Girometti2
Received: 2 March 2020 / Accepted: 29 August 2020 © The Author(s) 2020
Abstract Agreement measures are useful tools to both compare different evaluations of the same diagnostic outcomes and validate new rating systems or devices. Cohen’s kappa (κ) certainly is the most popular agreement method between two raters, and proved its effectiveness in the last sixty years. In spite of that, this method suffers from some alleged issues, which have been highlighted since the 1970s; moreover, its value is strongly dependent on the prevalence of the disease in the considered sample. This work introduces a new agreement index, the informational agreement (IA), which seems to avoid some of Cohen’s kappa’s flaws, and separates the contribution of the prevalence from the nucleus of agreement. These goals are achieved by modelling the agreement—in both dichotomous and multivalue ordered-categorical cases—as the information shared between two raters through the virtual diagnostic channel connecting them: the more information exchanged between the raters, the higher their agreement. In order to test its fair behaviour and the effectiveness of the method, IA has been tested on some cases known to be problematic for κ, in the machine learning context and in a clinical scenario to compare ultrasound (US) and automated breast volume scanner (ABVS) in the setting of breast cancer imaging. Keywords Diagnostic agreement · Cohen’s kappa statistic · Multivalue ordered-categorical ratings · Inter-reader agreement · Information measures
1 Introduction Diagnostic agreement is a measure to both appraise the reliability of a diagnostic exam and evaluate the accordance between different interpretations of the same diagnostic results. The very same approach has successfully been used also in different domains, such as machine learning, to iden-
Francesco Fabris
[email protected] Alberto Casagrande [email protected] Rossano Girometti [email protected] 1
Dipartimento di Matematica e Geoscienze, Universit`a degli Studi di Trieste, Trieste, Italy
2
Dipartimento di Area Medica, Istituto di Radiologia, Ospedale S. Maria della Misericordia, Universit`a degli Studi di Udine, Udine, Italy
tify noise in data sets and to compare multiple predictors in ensemble methods (e.g. see [40, 45]). Many different techniques have been introduced so far to gauge diagnostic agreement. For instance, raw agreement [2], Cohen’s kappa [13], intraclass correlation [44], McNemar’s test [34], and log odds ratio [22] have been proposed for the dichotomous analysis, i.e. when the scale accounts only two admissible values; on the contrary, weighted kappa [14], FleissCohen (quadratic) weights [23], intraclass correlation [2, 44], and association models [7] have been proposed for multivalue ordered-categorical ratings, i.e. when the admissible values are more than 2
Data Loading...