Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random for

  • PDF / 817,060 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 39 Downloads / 164 Views

DOWNLOAD

REPORT


(2020) 20:248

RESEARCH ARTICLE

Open Access

Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches Yuanren Tong1†, Keming Lu2†, Yingyun Yang1, Ji Li1, Yucong Lin3,4, Dong Wu1, Aiming Yang1, Yue Li1* , Sheng Yu3,4,5* and Jiaming Qian1

Abstract Background: Differentiating between ulcerative colitis (UC), Crohn’s disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. Methods: A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built. Results: The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/ 0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively. Conclusions: Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases. (Continued on next page)

* Correspondence: [email protected]; [email protected] † Yuanren Tong and Keming Lu contributed equally to this work. 1 Department of Gastroenterology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China 3 Center for Statistical Science, Tsinghua University, Beijing, China, Beijing 100084, China Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherw