Korean clinical entity recognition from diagnosis text using BERT

PDF / 2,454,468 Bytes
9 Pages / 595 x 791 pts Page_size
49 Downloads / 219 Views

RESEARCH

Open Access

Korean clinical entity recognition from diagnosis text using BERT Young-Min Kim1,2* and Tae-Hoon Lee2 From The 13th International Workshop on Data and Text Mining in Biomedicine Beijing, China. 07 November 2019

Abstract Background: While clinical entity recognition mostly aims at electronic health records (EHRs), there are also the demands of dealing with the other type of text data. Automatic medical diagnosis is an example of new applications using a different data source. In this work, we are interested in extracting Korean clinical entities from a new medical dataset, which is completely different from EHRs. The dataset is collected from an online QA site for medical diagnosis. Bidirectional Encoder Representations from Transformers (BERT), which is one of the best language representation models, is used to extract the entities. Results: A slightly modified version of BERT labeling strategy replaces the original labeling to enhance the separation of postpositions in Korean. A new clinical entity recognition dataset that we construct, as well as a standard NER dataset, have been used for the experiments. A pre-trained multilingual BERT model is used for the initialization of the entity recognition model. BERT significantly outperforms a character-level bidirectional LSTM-CRF, a benchmark model, in terms of all metrics. The micro-averaged precision, recall, and f1 of BERT are 0.83, 0.85 and 0.84, whereas that of bi-LSTM-CRF are 0.82, 0.79 and 0.81 respectively. The recall values of BERT are especially better than that of the other model. It can be interpreted that the trained BERT model could detect out of vocabulary (OOV) words better than bi-LSTM-CRF. Conclusions: The recently developed BERT and its WordPiece tokenization are effective for the Korean clinical entity recognition. The experiments using a new dataset constructed for the purpose and a standard NER dataset show the superiority of BERT compared to a state-of-the-art method. To the best of our knowledge, this work is one of the first studies dealing with clinical entity extraction from non-EHR data. Keywords: Clinical entity recognition, BERT, Korean, Diagnosis text

Background Clinical entity recognition traditionally aims at electronic health records (EHRs) [1] generated by healthcare providers. EHRs contain clinical information about patients including diagnoses, laboratory tests, clinical notes, etc [2]. The target entities are mostly technical *Correspondence: [email protected] Graduate School of Technology & Innovation Management, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, South Korea 2 Division of Interdisciplinary Industrial Studies, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, South Korea 1

terms precisely written by medical specialists. The medical problem, treatment, and test are typical entity types of these texts [3]. The extracted entities are fundamental to build clinical informatics applications [4]. Identification of patient cohorts, extraction of adverse drug events, and fin

Data Loading...

Korean clinical entity recognition from diagnosis text using BERT

Recommend Documents

Software Entity Recognition Method Based on BERT Embedding

BERT-Based Named Entity Recognition in Chinese Twenty-Four Histories

Evaluation of a Concept Mapping Task Using Named Entity Recognition and Normalization in Unstructured Clinical Text

Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-Based Neural Networks

Entity-Based Short Text Classification Using Convolutional Neural Networks

\(\mathtt{LODsyndesis}_{IE}\) : Entity Extraction from Text and Enrichment Using Hundreds of Linked Datasets

Reinforcement Learning for Named Entity Recognition from Noisy Data

Named Entity Recognition from Arabic-French Herbalism Parallel Corpora

A Survey on Named Entity Recognition Solutions Applied for Cybersecurity-Related Text Processing

AUG-BERT: An Efficient Data Augmentation Algorithm for Text Classification

Recognition Method of Important Words in Korean Text Based on Reinforcement Learning

PASCAL: a pseudo cascade learning framework for breast cancer treatment entity normalization in Chinese clinical text