Interrater Reliability for Multiple Raters in Clinical Trials of Ordinal Scale
- PDF / 122,774 Bytes
- 11 Pages / 612 x 792 pts (letter) Page_size
- 67 Downloads / 149 Views
595
Interrater Reliability for Multiple Raters in Clinical Trials of Ordinal Scale
Jungo Sawa, PhD Biometrics Department, R&D, Schering Plough KK, Tokyo, Japan Toshihiko Morikawa, PhD Biostatistics Center, Kurume University, Kurume, Japan
Key Words Reliability; Multiple raters; Ordinal scale; Intraclass correlation coefficients; Weighted kappa statistics The parts in this article were presented in the 38th Drug Information Association annual meeting (organized by Dr J. Kaufmann, Schering AG, Berlin), in Chicago, 2002. Correspondence Address Jungo Sawa, Shinjuku Park Tower 33F, Nishi Shinjuku 3-7-1, Shinjuku-ku, Tokyo, Japan, 163–1033.
This article discusses an evaluation method of reliability regarding the overall ratings of ordinal scales by multiple raters (k ≥ 3). It is shown that when the sample size (n) is large enough compared with the number of raters (n >> k), both the simple mean Fleiss-Cohen–type weighted kappa statistics averaged over all pairs of raters and the Davies-Fleiss-Schouten–type weighted kappa statistics for multiple raters are approximately equivalent to the intraclass correlation coefficient (ICC) obtained by assigning
INTRODUCTION In Japan, randomized single blind comparative studies have been conducted for evaluating a new diagnostic imaging agent. Three radiologists independently rated the magnetic resonance imaging (MRI) films under blinded conditions separately from the overall ratings of the investigators. However, the final ratings were given based on the council system, not using the ratings of each rater. Therefore, it has been widely believed that it was unnecessary to examine the interrater reliability. Recently, however, the Pharmaceuticals and Medical Devices Agency consultation system was started, and its consultants began to discuss the various statistical issues on the overall ratings of new diagnostic imaging agents, as recommended in the Food and Drug Administration guidance (1). They require not only the validity of the overall evaluation but also its reliability based on the ICH statistical guideline, ICH E9 (2). In comparative studies of diagnostic agents, the primary endpoint is usually obtained by a global assessment variable of ordinal scales for each patient. Therefore, in this article, we consider the kappa-type statistics for assessing reliability. The kappa statistic for a multinomial scale scored by multiple raters was developed by
the integer (natural number) scores for ordinal scales. These kappa statistics and the corresponding ICCs are illustrated in the overall ratings independently given by the three raters from some diagnostic agents studies for magnetic resonance imaging. Both fixed and random effects for raters are discussed and some comparative methods between treatment groups (test and reference) are proposed with an interpretation of the reliability for the overall ratings in the ordinal scale.
Davies and Fleiss (3). The kappa statistic for an ordinal scale scored by multiple raters was introduced by Schouten (4). In addition, Morikawa and Seki
Data Loading...