Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature

  • PDF / 1,667,997 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 68 Downloads / 174 Views

DOWNLOAD

REPORT


Journal of Cheminformatics Open Access

RESEARCH ARTICLE

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature Pedro Ruas*  , Andre Lamurias and Francisco M. Couto

Abstract  Background:  Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. Findings:  This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. Conclusions:  We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available. Keywords:  Named Entity Linking, Relation extraction, PageRank, Ontologies, Text mining Introduction Background

There has been an intense growth in the amount of scientific literature available, mainly in the form of scientific articles, whose content is mostly expressed in natural language. For instance, there are more than 30 million articles in the PubMed repository [1], which is one of *Correspondence: [email protected] LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749‑016 Lisbon, Portugal

the most used libraries in the Life Sciences and the Biomedical domains. This information overload creates problems for researchers who want to retrieve information, because they need to spend more time and effort to find the relevant articles for their work. Simultaneously, the number of online resources of biological information has also been rising, as it is the case of the domain ontologies. Domain ontologies provide a coherent representation of the knowledge in a specific scientific field, allowing a standardised nomenclature