TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition
- PDF / 1,437,554 Bytes
- 14 Pages / 595.224 x 790.955 pts Page_size
- 22 Downloads / 201 Views
TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition DunLu Peng1
· YinRui Wang1 · Cong Liu1 · Zhang Chen1
© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract Most of the current research on Named Entity Recognition (NER) in the Chinese domain is based on the assumption that annotated data are adequate. However, in many scenarios, the sufficient amount of annotated data required for Chinese NER task is difficult to obtain, resulting in poor performance of machine learning methods. In view of this situation, this paper tries to excavate the information contained in the massive unlabeled raw text data and utilize it to enhance the performance of Chinese NER task. A deep learning model combined with Transfer Learning technique is proposed in this paper. This method can be leveraged in some domains where there is a large amount of unlabeled text data and a small amount of annotated data. The experiment results show that the proposed method performs well on different sized datasets, and this method also avoids errors that occur during the word segmentation process. We also evaluate the effect of transfer learning from different aspects through a series of experiments. Keywords Transfer learning · Chinese named entity recognition · Natural language processing · Deep learning
1 Introduction Named Entity Recognition(NER) technology is one of the core tasks in the field of Natural Language Processing(NLP) research, which aims to automatically discover information entities and identify their corresponding categories from data (Nadeau and Sekine 2007). Efficient and accurate identification of entity information contained in text is of great significance for computer processing text data. In the field of NLP research, a number of highlevel tasks, such as information retrieval, knowledge graph,
DunLu Peng
[email protected] YinRui Wang [email protected] Cong Liu [email protected] Zhang Chen [email protected] 1
School of Optical-Electrical and Computer Engineer, University of Shanghai for Science and Technology, Shanghai, 200093, China
sentiment analysis (Smith et al. 2018) and question answering system (Agrawal et al. 2015), need the NER task as one of their basic components. For example, in information retrieval, it is necessary to identify the relevant entity information from the input text to achieve accurate searching results (Nemeskey and Kornai 2018); the questionanswering system requires to identify the entity types and their correlation in order to better answer questions. The efficiency and accuracy of conducting NER will affect subsequent tasks. Therefore, it is of great value to conduct in-depth research on NER. Compared to other languages, our observations maintain the Chinese NER task is more difficult, because of the following causes: 1) there is no clear boundary between two words adjacent to each other in Chinese, so one intuitive way of performing Chinese NER is to perform word segmentation first. However, incorrectly segmented entity boundaries wi
Data Loading...