Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics
- PDF / 1,294,265 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 80 Downloads / 162 Views
Exploiting Latent Semantic Subspaces to Derive Associations for Specific Pharmaceutical Semantics Janus Wawrzinek1 · José María González Pinto1 · Oliver Wiehr1 · Wolf‑Tilo Balke1 Received: 30 June 2020 / Accepted: 12 August 2020 © The Author(s) 2020
Abstract State-of-the-art approaches in the field of neural embedding models (NEMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis generation and drug repositioning. A core challenge in the biomedical domain is to have interpretable semantics from NEMs that can distinguish, for instance, between the following two situations: (a) drug x induces disease y and (b) drug x treats disease y. However, NEMs alone cannot distinguish between associations such as treats or induces. Is it possible to develop a model to learn a latent representation from the NEMs capable of such disambiguation? To what extent do we need domain knowledge to succeed in the task? In this paper, we answer both questions and show that our proposed approach not only succeeds in the disambiguation task but also advances current growing research efforts to find real predictions using a sophisticated retrospective analysis. Furthermore, we investigate which type of associations is generally better contextualized and therefore probably has a stronger influence in our disambiguation task. In this context, we present an approach to extract an interpretable latent semantic subspace from the original embedding space in which therapeutic drug–disease associations are more likely. Keywords Digital libraries · Association mining · Information extraction · Neural embeddings · Semantic enrichment · Semantic subspaces
1 Introduction Today’s digital libraries have to manage the exponential growth of scientific publications [1], which results in faster-growing data holdings. To illustrate the effects of this growth, consider as an example Sara, a young scientist from the pharmaceutical field who wants to find drugs related to “Diabetes” to design a new hypothesis that might link an existing drug with “Diabetes” that has not yet been discovered (not published in a paper). Indeed, this is a complex * Janus Wawrzinek [email protected]‑bs.de José María González Pinto [email protected]‑bs.de Oliver Wiehr wiehr@tu‑bs.de Wolf‑Tilo Balke [email protected]‑bs.de 1
Institute for Information Systems, TU-Braunschweig, Mühlenpfordstrasse 23, 38106 Braunschweig, Germany
information need, and in this context, a term-based search in the digital library PubMed leads to 39,000 hits for the year 2019 alone. Due to these data amounts, Sara will have to dedicate considerable time to analyse each paper and take some other steps to satisfy her information need. Given this complicated situation, we believe that this problem makes innovative access paths beyond term-based searches necessary. One of the most effec
Data Loading...