A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing

  • PDF / 373,656 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 70 Downloads / 143 Views

DOWNLOAD

REPORT


EDUCATION & TRAINING

A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing Helene B. Fevrier 1 & Liyan Liu 1 & Lisa J. Herrinton 1

&

Dan Li 1,2

Received: 18 February 2020 / Accepted: 15 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Key variables recorded as text in colonoscopy and pathology reports have been extracted using natural language processing (NLP) tools that were not easily adaptable to new settings. We aimed to develop a reliable NLP tool with broad adaptability. During 1996–2016, Kaiser Permanente Northern California performed 401,566 colonoscopies with linked pathology. We randomly sampled 1000 linked reports into a Training Set and developed an NLP tool using SAS® PERL regular expressions. The NLP tool captured five colonoscopy and pathology variables: type, size, and location of polyps; extent of procedure; and quality of bowel preparation. We used a Validation Set (N = 3000) to confirm the variables’ classifications using manual chart review as the reference. Performance of the NLP tool was assessed using the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen’s κ. Cohen’s κ ranged from 93 to 99%. The sensitivity and specificity ranged from 95 to 100% across all categories. For categories with prevalence exceeding 10%, the PPV ranged from 97% to 100% except for adequate quality of preparation (prevalence 92%), for which the PPV was 65%. For categories with prevalence below 10%, the PPVs ranged from 62% to 100%. NPVs ranged from 94% to 100% except for the “complete” extent of procedure, for which the NPV was 73%. Using information from a large community-based population, we developed a transparent and adaptable NLP tool for extracting five colonoscopy and pathology variables. The tool can be readily tested in other healthcare settings. Keywords Natural language processing . Pathology report . Colonoscopy

Introduction High quality colonoscopy is critical for reducing the burden of colorectal cancer (CRC) [1, 2]. The adenoma detection rate is a widely accepted benchmark of colonoscopy quality [3, 4]. In recent years, serrated polyps (SPs) have been recognized as precursors of 20–30% of CRCs [4, 5]. Consequently, the detection rate of SPs and particularly sessile serrated adenomas

This article is part of the Topical Collection on Education & Training Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10916-020-01604-8) contains supplementary material, which is available to authorized users. * Lisa J. Herrinton [email protected] 1

Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA

2

Department of Gastroenterology, Kaiser Permanente Northern California, Santa Clara, CA, USA

(SSAs) (or sessile serrated polyps, SSPs) has been proposed as an additional benchmark of colonoscopy quality [6]. The efficacy of colonoscopy depends on high-quality examination. Automating the assessmen