Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition

PDF / 839,817 Bytes
15 Pages / 595 x 842 pts (A4) Page_size
71 Downloads / 233 Views

. RESEARCH PAPER .

October 2020, Vol. 63 202102:1–202102:15 https://doi.org/10.1007/s11432-020-2982-y

Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition Chen GONG, Zhenghua LI* , Qingrong XIA, Wenliang CHEN & Min ZHANG School of Computer Science and Technology, Soochow University, Suzhou 215006, China Received 22 April 2020/Accepted 14 May 2020/Published online 16 September 2020

Abstract Chinese named entity recognition (CNER) aims to identify entity names such as person names and organization names from Chinese raw text and thus can quickly extract the entity information that people are concerned about from large-scale texts. Recent studies attempt to improve performance by integrating lexicon words into char-based CNER models. These existing studies, however, usually focus on leveraging the context-free words in lexicon without considering the contextual information of words and subwords in the sentences. To address this issue, in addition to utilizing the lexicon words, we further propose to construct a hierarchical tree structure representation composed of characters, subwords and context-aware predicted words from segmentor to represent each sentence for CNER. Based on the tree-structure representation, we propose a hierarchical long short-term memory (HiLSTM) framework, which consists of hierarchical encoding layer, fusion layer and CRF layer, to capture linguistic knowledge at different levels. On the one hand, the interactions within each level help to obtain the contextual information. On the other hand, the propagations from the lower-levels to the upper-levels can provide additional semantic knowledge for CNER. Experimental results on three widely used CNER datasets show that our proposed HiLSTM model achieves significant improvement over several strong benchmark methods. Keywords

natural language processing, named entity recognition, representation learning, neural networks

Citation Gong C, Li Z H, Xia Q R, et al. Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition. Sci China Inf Sci, 2020, 63(10): 202102, https://doi.org/10.1007/ s11432-020-2982-y

1

Introduction

As a fundamental task in natural language processing (NLP), the purpose of named entity recognition (NER) is to identify named entities from raw texts, such as person names, organization names, and location names. Named entities are indispensable for many down-stream NLP applications, such as information retrieval [1], relation extraction [2], and question answering [3]. For example, in the medical field, identifying entities such as disease names, symptom names, and medicine names from the electronic medical records allows doctors to quickly understand the health status and the treatment of patients, which is helpful for decision-making [4]; further extracting the relations between the entities can be used to study the similarities between different patients and to find the contraindications to medicine use [5]; NER is also

Data Loading...

Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition

Recommend Documents

ALBERT-Based Chinese Named Entity Recognition

Chinese Named Entity Recognition via Adaptive Multi-pass Memory Network with Hierarchical Tagging Mechanism

A Neural Framework for Chinese Medical Named Entity Recognition

BERT-Based Named Entity Recognition in Chinese Twenty-Four Histories

Iterative Strategy for Named Entity Recognition with Imperfect Annotations

A Survey on Named Entity Recognition

Development of Kazakh Named Entity Recognition Models

Named Entity Recognition with Context-Aware Dictionary Knowledge

Improving biomedical named entity recognition with syntactic information

TL-NER: A Transfer Learning Model for Chinese Named Entity Recognition

BiGCNN: Bidirectional Gated Convolutional Neural Network for Chinese Named Entity Recognition

When to Use OCR Post-correction for Named Entity Recognition?