Comparison of deep learning models for natural language processing-based classification of non-English head CT reports
- PDF / 387,714 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 16 Downloads / 150 Views
DIAGNOSTIC NEURORADIOLOGY
Comparison of deep learning models for natural language processing-based classification of non-English head CT reports Yiftach Barash 1,2 & Gennadiy Guralnik 3 & Noam Tau 1 & Shelly Soffer 1,2,4 & Tal Levy 2,3 & Orit Shimon 3 & Eyal Zimlichman 4 & Eli Konen 1 & Eyal Klang 1,2 Received: 27 January 2020 / Accepted: 26 March 2020 # Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Purpose Natural language processing (NLP) can be used for automatic flagging of radiology reports. We assessed deep learning models for classifying non-English head CT reports. Methods We retrospectively collected head CT reports (2011–2018). Reports were signed in Hebrew. Emergency department (ED) reports of adult patients from January to February for each year (2013–2018) were manually labeled. All other reports were used to pre-train an embedding layer. We explored two use cases: (1) general labeling use case, in which reports were labeled as normal vs. pathological; (2) specific labeling use case, in which reports were labeled as with and without intra-cranial hemorrhage. We tested long short-term memory (LSTM) and LSTM-attention (LSTM-ATN) networks for classifying reports. We also evaluated the improvement of adding Word2Vec word embedding. Deep learning models were compared with a bag-of-words (BOW) model. Results We retrieved 176,988 head CT reports for pre-training. We manually labeled 7784 reports as normal (46.3%) or pathological (53.7%), and 7.1% with intra-cranial hemorrhage. For the general labeling, LSTM-ATN-Word2Vec showed the best results (AUC = 0.967 ± 0.006, accuracy 90.8% ± 0.01). For the specific labeling, all methods showed similar accuracies between 95.0 and 95.9%. Both LSTM-ATN-Word2Vec and BOW had the highest AUC (0.970). Conclusion For a general use case, word embedding using a large cohort of non-English head CT reports and ATN improves NLP performance. For a more specific task, BOW and deep learning showed similar results. Models should be explored and tailored to the NLP task. Keywords Natural language processing . Deep learning . Attention . Tomography, X-ray computed . Emergency service, hospital
Introduction Hospital emergency departments (ED) are increasingly being overwhelmed [1]. Non-contrast head computed to-
* Eyal Klang [email protected] 1
Division of Diagnostic Imaging, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Derech Sheba St 2, Ramat Gan, Israel
2
DeepVision Lab, Sheba Medical Center, Ramat Gan, Israel
3
Tel Aviv University, Tel Aviv, Israel
4
Management, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Ramat Gan, Israel
mography (CT) is the most frequently performed CT scan in the ED [2–4]. Flagging of reports could help prioritize patient care. Radiological reports are usually stored as unstructured free-text. This makes the extraction of data difficult [5–10]. NLP algorithms are designed to structure such free-text. The role of NLP in structuring electronic medical records (EMR) has been pr
Data Loading...