Utilizing external corpora through kernel function: application in biomedical named entity recognition

PDF / 423,887 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
86 Downloads / 187 Views

REGULAR PAPER

Utilizing external corpora through kernel function: application in biomedical named entity recognition Rakesh Patra1 · Sujan Kumar Saha1 Received: 25 October 2019 / Accepted: 12 May 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Performance of word sequential labelling tasks like named entity recognition and parts-of-speech tagging largely depends on the features chosen in the task. But, in general representing a word as well as capturing its characteristics properly through a set of features is quite difficult. Moreover, external resources often become essential in order to build a high-performance system. But, acquiring required knowledge demands domain-specific processing and feature engineering. Kernel functions along with support vector machine may offer an alternative way to more efficiently capture similarity between words using both the local context and the external corpora. In this paper, we aim to compute similarity between the words using their context information, syntactic information and occurrence statistics in external corpora. This similarity value is gathered through a kernel function. The proposed kernel function combines two sub-kernels. One of these captures global information through words co-occurrence statistics accumulated from a large corpora. The second kernel captures local semantic information of the words through word specific parse tree fragmentation. We test this proposed kernel using JNLPBA 2004 Biomedical Named Entity Recognition and BioCreative II 2006 Gene Mention Recognition task data-sets. In our experiments, we observe that the proposed method is effective on both the data-sets. Keywords Support vector machines · Kernel function · Named entity recognition · Biomedical informatics · Feature extraction

1 Introduction Support vector machine [7] is one of the popular classifiers used in various sequential labelling tasks. Support vector machine (SVM) finds a decision hyperplane between the competing classes through a subset of training samples (support vector) from both the classes that are closest to the decision hyperplane. A linear classifier requires a set of features to map the input samples into the Hilbert space where it would find the decision boundary between the competing classes. Composing a suitable feature set requires many hit and trial for an exhaustive amount of time. The kernel functions offer an alternative way that can utilize the given feature set to find more linearly separable sample space by map-

B

Sujan Kumar Saha [email protected] Rakesh Patra [email protected]

1

Department of Computer Science, Birla Institute of Technology, Mesra, Ranchi, Jharkhand 835215, India

ping them into higher-dimensional vectors. Utilizing kernel functions for better utilizing extracted features or proposing new task-specific kernel function for effective computation of similarity between the instances has been an active research area in the last two decades. We found the trend of utilizing kernel functions for novel appl

Data Loading...

Utilizing external corpora through kernel function: application in biomedical named entity recognition

Recommend Documents

Named Entity Recognition from Arabic-French Herbalism Parallel Corpora

Improving biomedical named entity recognition with syntactic information

A Survey on Named Entity Recognition

ALBERT-Based Chinese Named Entity Recognition

Development of Kazakh Named Entity Recognition Models

Performance Enhancement of Gene Mention Tagging by Using Deep Learning and Biomedical Named Entity Recognition

BERT-Based Named Entity Recognition in Chinese Twenty-Four Histories

Named Entity Recognition with Context-Aware Dictionary Knowledge

When to Use OCR Post-correction for Named Entity Recognition?

Named Entity Recognition for Icelandic: Annotated Corpus and Models

Cross-Lingual Transfer Learning for Medical Named Entity Recognition

Incorporating Boundary and Category Feature for Nested Named Entity Recognition