A semi-supervised approach for extracting TCM clinical terms based on feature words
- PDF / 1,060,294 Bytes
- 7 Pages / 595.276 x 790.866 pts Page_size
- 67 Downloads / 142 Views
RESEARCH
Open Access
A semi-supervised approach for extracting TCM clinical terms based on feature words Liangliang Liu1, Xiaojing Wu1, Hui Liu1*, Xinyu Cao2, Haitao Wang2, Hongwei Zhou3 and Qi Xie4* From 5th China Health Information Processing Conference Guangzhou, China. 22-24 November 2019
Abstract Background: A semi-supervised model is proposed for extracting clinical terms of Traditional Chinese Medicine using feature words. Methods: The extraction model is based on BiLSTM-CRF and combined with semi-supervised learning and feature word set, which reduces the cost of manual annotation and leverage extraction results. Results: Experiment results show that the proposed model improves the extraction of five types of TCM clinical terms, including traditional Chinese medicine, symptoms, patterns, diseases and formulas. The best F1-value of the experiment reaches 78.70% on the test dataset. Conclusions: This method can reduce the cost of manual labeling and improve the result in the NER research of TCM clinical terms. Keywords: TCM, NER, Clinical terms, Deep learning, Semi-supervised
Background Named entity recognition (NER) is an important research work in natural language processing. In the field of Traditional Chinese Medicine (TCM), there is a vast amount of ancient books and medical records, which contain a huge multitude of TCM clinical terms. These terms contain rich and high-value information. There are three main research significances for the TCM NER research [1]. Firstly, it is important to summarize the TCM clinical diagnosis and treatment rules. Since the TCM clinical corpus contains great amounts of information that include patient health, symptoms, diseases, and treatment plans based on clinical practices. Secondly, it * Correspondence: [email protected]; [email protected] 1 School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China 4 Department of Academic Management, China Academy of Chinese Medical Sciences, Beijing 100700, China Full list of author information is available at the end of the article
benefits the construction of TCM expert systems, TCM knowledge graphs, and TCM QA systems [2]. Thirdly, the study of extracting TCM clinical terms promotes the standardization system construction of TCM clinical terms and help to make better comparison between TCM clinical terms and Modern clinical terms [3]. However, ancient Chinese language is extensively used in the TCM corpus which brings difficulties to TCM NER research [4]. In this paper, we introduce a semisupervised approach for extracting TCM clinical terms based on feature words. In the experiments, five types of Chinese medicine clinical terms are automatically extracted from TCM related corpus, including: Chinese traditional medicines, formulas, diseases, patterns, and symptoms. The proposed method reduces the cost of manual labeling under the semi-supervised learning and improves the TCM clinical extraction ability based on the feature words. Results show that the proposed
Data Loading...