TermInformer: unsupervised term mining and analysis in biomedical literature

PDF / 3,133,606 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
16 Downloads / 288 Views

(0123456789().,-volV)(0123456789(). ,- volV)

S.I.: DATA FUSION IN THE ERA OF DATA SCIENCE

TermInformer: unsupervised term mining and analysis in biomedical literature Prayag Tiwari1

•

Sagar Uprety2 • Shahram Dehdashti3 • M. Shamim Hossain4

Received: 17 June 2020 / Accepted: 2 September 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Terminology is the most basic information that researchers and literature analysis systems need to understand. Mining terms and revealing the semantic relationships between terms can help biomedical researchers find solutions to some major health problems and motivate researchers to explore innovative biomedical research issues. However, how to mine terms from biomedical literature remains a challenge. At present, the research on text segmentation in natural language processing (NLP) technology has not been well applied in the biomedical field. Named entity recognition models usually require a large amount of training corpus, and the types of entities that the model can recognize are limited. Besides, dictionary-based methods mainly use pre-established vocabularies to match the text. However, this method can only match terms in a specific field, and the process of collecting terms is time-consuming and labour-intensive. Many scenarios faced in the field of biomedical research are unsupervised, i.e. unlabelled corpora, and the system may not have much prior knowledge. This paper proposes the TermInformer project, which aims to mine the meaning of terms in an open fashion by calculating terms and find solutions to some of the significant problems in our society. We propose an unsupervised method that can automatically mine terms in the text without relying on external resources. Our method can generally be applied to any document data. Combined with the word vector training algorithm, we can obtain reusable term embeddings, which can be used in any NLP downstream application. This paper compares term embeddings with existing word embeddings. The results show that our method can better reflect the semantic relationship between terms. Finally, we use the proposed method to find potential factors and treatments for lung cancer, breast cancer, and coronavirus. Keywords Term mining Unsupervised learning Term embeddings Sequence labelling GloVe Biomedical literature

& Prayag Tiwari [email protected] & M. Shamim Hossain [email protected] Sagar Uprety [email protected]

3

School of Information Systems, Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia

4

Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

Shahram Dehdashti [email protected] 1

Department of Information Engineering, University of Padova, Padua, Italy

2

The Open University, London, UK

123

Neural Computing and Applications

1 Introduction Term mining aims to mine terms from unstructured documents. Terminology is usually composed of multiple

Data Loading...

TermInformer: unsupervised term mining and analysis in biomedical literature

Recommend Documents

Biomedical Literature Mining

Biomedical Literature

Biomedical Data Mining, Spatial

PALMER: improving pathway annotation based on the biomedical literature mining with a constrained latent block model

Gene Expression Data Analysis: Unsupervised Analysis

Topics and Trends Analysis in eHealth Literature

Literature Reviews and Meta Analysis

Scholarly literature mining with Information Retrieval and Natural Language Processing

Sample Preparation for Biomedical and Environmental Analysis

Data Mining for Biomedical Applications PAKDD 2006 Workshop, Bio

Drug-Drug Interaction Prediction on a Biomedical Literature Knowledge Graph

Brazilian Forest Fire Analysis: An Unsupervised Approach