Improving biomedical named entity recognition with syntactic information

PDF / 2,073,678 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
20 Downloads / 307 Views

RESEARCH ARTICLE

Open Access

Improving biomedical named entity recognition with syntactic information Yuanhe Tian1†, Wang Shen2†, Yan Song3,4* , Fei Xia1, Min He2 and Kenli Li2 *Correspondence: [email protected] † Yuanhe Tian and Wang Shen contributed equally to this work 3 The Chinese University of Hong Kong, Shenzhen, China Full list of author information is available at the end of the article

Abstract Background: Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts, which can be challenging due to the lack of large-scale labeled training data and domain knowledge. To address the challenge, in addition to using powerful encoders (e.g., biLSTM and BioBERT), one possible method is to leverage extra knowledge that is easy to obtain. Previous studies have shown that auto-processed syntactic information can be a useful resource to improve model performance, but their approaches are limited to directly concatenating the embeddings of syntactic information to the input word embeddings. Therefore, such syntactic information is leveraged in an inflexible way, where inaccurate one may hurt model performance. Results: In this paper, we propose BioKMNER, a BioNER model for biomedical texts with key-value memory networks (KVMN) to incorporate auto-processed syntactic information. We evaluate BioKMNER on six English biomedical datasets, where our method with KVMN outperforms the strong baseline method, namely, BioBERT, from the previous study on all datasets. Specifically, the F1 scores of our best performing model are 85.29% on BC2GM, 77.83% on JNLPBA, 94.22% on BC5CDR-chemical, 90.08% on NCBI-disease, 89.24% on LINNAEUS, and 76.33% on Species-800, where state-of-theart performance is obtained on four of them (i.e., BC2GM, BC5CDR-chemical, NCBIdisease, and Species-800). Conclusion: The experimental results on six English benchmark datasets demonstrate that auto-processed syntactic information can be a useful resource for BioNER and our method with KVMN can appropriately leverage such information to improve model performance. Keywords: Named entity recognition, Text mining, Key-value memory networks, Syntactic information, Neural networks

Background Biomedical named entity recognition (BioNER) is an important and challenging task for understanding biomedical texts. It aims to recognize named entities (NEs), such as diseases, gene, species, etc., in biomedical texts and plays an important role in many downstream natural language processing (NLP) tasks, such as drug-drug interaction task [21, 34] and knowledge base completion [38, 47]. Compared to named entity recognition in © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other th

Data Loading...

Improving biomedical named entity recognition with syntactic information

Recommend Documents

A Survey on Named Entity Recognition

ALBERT-Based Chinese Named Entity Recognition

Development of Kazakh Named Entity Recognition Models

Named Entity Recognition with Context-Aware Dictionary Knowledge

Iterative Strategy for Named Entity Recognition with Imperfect Annotations

Performance Enhancement of Gene Mention Tagging by Using Deep Learning and Biomedical Named Entity Recognition

Utilizing external corpora through kernel function: application in biomedical named entity recognition

When to Use OCR Post-correction for Named Entity Recognition?

Named Entity Recognition for Icelandic: Annotated Corpus and Models

Cross-Lingual Transfer Learning for Medical Named Entity Recognition

Incorporating Boundary and Category Feature for Nested Named Entity Recognition

Reinforcement Learning for Named Entity Recognition from Noisy Data