Literature mining for context-specific molecular relations using multimodal representations (COMMODAR)

  • PDF / 1,044,155 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 14 Downloads / 167 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Literature mining for context-specific molecular relations using multimodal representations (COMMODAR) Jaehyun Lee1, Doheon Lee1,2* and Kwang Hyung Lee1* From The 13th International Workshop on Data and Text Mining in Biomedical Informatics Beijing, China. 3-7 November 2019

Abstract: Biological contextual information helps understand various phenomena occurring in the biological systems consisting of complex molecular relations. The construction of context-specific relational resources vastly relies on laborious manual extraction from unstructured literature. In this paper, we propose COMMODAR, a machine learning-based literature mining framework for context-specific molecular relations using multimodal representations. The main idea of COMMODAR is the feature augmentation by the cooperation of multimodal representations for relation extraction. We leveraged biomedical domain knowledge as well as canonical linguistic information for more comprehensive representations of textual sources. The models based on multiple modalities outperformed those solely based on the linguistic modality. We applied COMMODAR to the 14 million PubMed abstracts and extracted 9214 context-specific molecular relations. All corpora, extracted data, evaluation results, and the implementation code are downloadable at https://github.com/jae-hyun-lee/commodar. Ccs concepts: • Computing methodologies~Information extraction • Computing methodologies~Neural networks • Applied computing~Biological networks. Keywords: Biological context, Literature mining, Natural language processing, Representation learning

Background Complex biological systems are known to comprise the coordination of molecular interactions and the relationship between molecules will consequently determine the behavior of the entire system. Molecular network models are often considered to be valuable for elucidating the organizing principles of biological systems and promoting public health. For example, biological networks are of pharmacological interest as an aid to the prediction of the side effects or multi-targeting drug efficacy. In the pursuit to develop network models, biomedical researchers have increasingly depended on informatics resources which serve various patterns of molecular relations [1]. Yoon et al. had integrated pathway resources comprised of the relations between biological molecules and substantiated that information from various * Correspondence: [email protected]; [email protected] 1 Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea Full list of author information is available at the end of the article

resources were sometimes contradictory [2]. For instance, one database supports that a protein A INCREASES the activity of a protein B, whereas another one supports that the protein A DECREASES the activity of the protein B. Yoon et al. partially attributed these discrepancies to the lack of the contextual information, which specified the biological circumstance of the