BIOINTMED: integrated biomedical knowledge base with ontologies and clinical trials

  • PDF / 1,919,877 Bytes
  • 16 Pages / 595.224 x 790.955 pts Page_size
  • 19 Downloads / 227 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

BIOINTMED: integrated biomedical knowledge base with ontologies and clinical trials Ankita Saha1

· Jayanta Mukhopadhyay2 · Sudeshna Sarkar2 · Mahanandeeshwar Gattu3

Received: 21 August 2019 / Accepted: 22 May 2020 © International Federation for Medical and Biological Engineering 2020

Abstract Biomedical data are complex and heterogeneous. An ample reliable quantity of data is important for understanding and exploring the domain. The work aims to integrate biomedical data from various heterogeneous sources like dictionaries or corpus and amalgamate them into a uniform format for easier access by the end-user like biologist, pharmacist, and data scientist. The proposed integrated biomedical knowledge base, BIOINTMED, has 11,299, 12,981, 4428, 61,491, 48,663, and 13,146 unique entities for drugs, diseases, targets, genes, biomedical pathways, and adverse events, respectively. The uniform aggregated collection is also explored to study the interaction among these entity pairs. Finally, a complete statistical analysis of the consolidated biomedical entities is provided. Keywords Biomedical knowledge base · Data integration · Biomedical integrated system · Dictionaries · Clinical trials

1 Introduction Biomedical knowledge has a high impact on humankind and society. It plays a significant role in the domain of pharmaceutical research and industry. The data involved are complex and heterogeneous, which makes them more challenging to understand and explore. In recent years, with an increase in the amount of biomedical data, there is a rising opportunity for exploring and learning about their interactions and effects. Data preparation is always an enormous and time-consuming task in the biomedical field. The process involves assembling data from multiple reliable  Ankita Saha

[email protected] Jayanta Mukhopadhyay [email protected] Sudeshna Sarkar [email protected] Mahanandeeshwar Gattu [email protected] 1

Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur, India

2

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India

3

Excelra Knowledge Solutions Pvt Ltd, Hyderabad, India

sources, analysing their statistics, and processing them into some specific format, with satisfying the requirements of the associated biomedical problem. Drug discovery, entity recognition, relation extraction, and adverse drug reaction detection are some of the crucial problems in this field. Data plays an important role in understanding and exploring the research opportunities of all the domains. And, so there is a necessity of appropriate tool to investigate the data, specially when it is voluminous as in the case of the biomedical domain. There have been some works in the line of developing tools to explore huge data. Pinero et al. [1] integrated various gene-related data and diseases with expert-curated knowledge from genome-wide association study catalogues, animal models, and scientific literature int