IICE: Web Tool for Automatic Identification of Chemical Entities and Interactions

Automatic methods are being developed and applied to transform textual biomedical information into machine-readable formats. Machine learning techniques have been a prominent approach to this problem. However, there is still a lack of systems that are eas

  • PDF / 188,226 Bytes
  • 4 Pages / 439.37 x 666.142 pts Page_size
  • 89 Downloads / 172 Views

DOWNLOAD

REPORT


Faculdade de Ciˆencias, BioISI: Biosystems & Integrative Sciences Institute, Universidade de Lisboa, Lisboa, Portugal 2 LaSIGE, Departamento de Inform´ atica, Faculdade de Ciˆencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal [email protected], [email protected], [email protected]

Abstract. Automatic methods are being developed and applied to transform textual biomedical information into machine-readable formats. Machine learning techniques have been a prominent approach to this problem. However, there is still a lack of systems that are easily accessible to users. For this reason, we developed a web tool to facilitate the access to our text mining framework, IICE (Identifying Interactions between Chemical Entities). This tool annotates the input text with chemical entities and identifies the interactions described between these entities. Various options are available, which can be manipulated to control the algorithms employed by the framework and to the output formats. Keywords: Text mining · Machine learning · Ontologies · Named entity recognition · Relation extraction

1

Introduction

The amount of information about chemical compounds that is published in the form of scientific literature is growing at an unprecedented rate [1]. To update the chemical interactions described in databases, such as DrugBank [4] and IntAct [3], relies on manual reading and parsing the literature. This means that this update will always lag behind scientific publications, as experts extract the relevant information from the papers. For this reason, there is a growing need for automatic methods that transform biomedical text into machine-readable structured data, such as an interaction between compounds. Information extraction systems applied to the biomedical domain have been developed and are available to the community [5]. However, their performance depends on the machine used by the user, usually requiring external libraries and specific installation instructions. A more practical solution is releasing the system as a web tool, with a front-end enabling any user to test and experiment with it. We developed the IICE framework (Identifying Interactions between Chemical Entities), for automatic annotation of biomedical documents. IICE is based on c Springer International Publishing Switzerland 2015  A. Bifet et al. (Eds.): ECML PKDD 2015, Part III, LNAI 9286, pp. 285–288, 2015. DOI: 10.1007/978-3-319-23461-8 31

286

A. Lamurias et al.

supervised machine learning algorithms and semantic similarity between ontology concepts. We have evaluated the framework with the CHEMDNER [7] dataset, for the recognition of chemical entities, and with the DDIExtraction dataset [8], for extraction of drug-drug interactions. The F-measure obtained for each dataset was of 78.26% and 72.52%, respectively, which can be considered nearly state-ofthe-art. The IICE framework can be accessed by a web tool1 , with several configuration options available to the user. These options enable the user to obtain different results by adjusting the