Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing

  • PDF / 1,107,666 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 99 Downloads / 216 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing Andrew L. Callen 1 & Sara M. Dupont 2 & Adi Price 3 & Ben Laguna 3 & David McCoy 3 & Bao Do 4 & Jason Talbott 3 & Marc Kohli 3 & Jared Narvid 3 Received: 12 December 2019 / Revised: 10 June 2020 / Accepted: 23 July 2020 # Society for Imaging Informatics in Medicine 2020

Abstract The ideal radiology report reduces diagnostic uncertainty, while avoiding ambiguity whenever possible. The purpose of this study was to characterize the use of uncertainty terms in radiology reports at a single institution and compare the use of these terms across imaging modalities, anatomic sections, patient characteristics, and radiologist characteristics. We hypothesized that there would be variability among radiologists and between subspecialities within radiology regarding the use of uncertainty terms and that the length of the impression of a report would be a predictor of use of uncertainty terms. Finally, we hypothesized that use of uncertainty terms would often be interpreted by human readers as “hedging.” To test these hypotheses, we applied a natural language processing (NLP) algorithm to assess and count the number of uncertainty terms within radiology reports. An algorithm was created to detect usage of a published set of uncertainty terms. All 642,569 radiology report impressions from 171 reporting radiologists were collected from 2011 through 2015. For validation, two radiologists without knowledge of the software algorithm reviewed report impressions and were asked to determine whether the report was “uncertain” or “hedging.” The relationship between the presence of 1 or more uncertainty terms and the human readers’ assessment was compared. There were significant differences in the proportion of reports containing uncertainty terms across patient admission status and across anatomic imaging subsections. Reports with uncertainty were significantly longer than those without, although report length was not significantly different between subspecialities or modalities. There were no significant differences in rates of uncertainty when comparing the experience of the attending radiologist. When compared with reader 1 as a gold standard, accuracy was 0.91, sensitivity was 0.92, specificity was 0.9, and precision was 0.88, with an F1-score of 0.9. When compared with reader 2, accuracy was 0.84, sensitivity was 0.88, specificity was 0.82, and precision was 0.68, with an F1-score of 0.77. Substantial variability exists among radiologists and subspecialities regarding the use of uncertainty terms, and this variability cannot be explained by years of radiologist experience or differences in proportions of specific modalities. Furthermore, detection of uncertainty terms demonstrates good test characteristics for predicting human readers’ assessment of uncertainty. Keywords Diagnostic uncertainty . Natural language processing

Background * Andrew L. Callen [email protected] 1

Department of Radiology, Unive

Data Loading...