Impact of the Continuous Evolution of Gene Ontology on the Performance of Similarity Measures for Scoring Confidence of

  • PDF / 914,638 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 93 Downloads / 157 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Impact of the Continuous Evolution of Gene Ontology on the Performance of Similarity Measures for Scoring Confidence of Protein Interactions Madhusudan Paul1,2   · Ashish Anand1 · Saptarshi Pyne1 Received: 16 March 2020 / Accepted: 28 September 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract Gene ontology (GO) is a comprehensive resource for the properties of gene products and their relationships. A similarity measure can be defined between two gene products by utilizing GO, and the corresponding similarity score can be treated as a likelihood to interact between them physically. However, GO is being updated regularly by the addition of new terms and removal/merging of obsolete terms. Therefore, the similarity score of interaction may differ from one instance of GO to another. In this paper, we systematically study the impact of the continuous evolution of GO on the performance of similarity measures for the task of scoring confidence of protein–protein interactions (PPIs). We find that the performance of a similarity measure gets affected due to the continuous evolution of GO. We further observe that the degree of robustness of a similarity measure is highly influenced by the particular setting we consider. Keywords  Gene ontology (GO) · Protein–protein interaction (PPI) · Similarity measures

Introduction Gene ontology (GO) [2] has become a de facto standard to asses the functional relationships among gene products. GO is a taxonomy of biological terms related to the properties of genes and/or gene products (e.g., proteins) and organized as a directed acyclic graph (DAG) to represent the relationship among the terms. There are three GOs: biological process (BP), cellular component (CC), and molecular function (MF). Gene or gene products in different model organisms are annotated to GO terms based on different sources of evidence. Since gene products are not directly represented in GO, annotation corpora are used to link between a gene product and a GO term. An annotation corpus of a species This article is part of the topical collection “Computational Biology and Biomedical Informatics” guest edited by Dhruba Kr Bhattacharyya, Sushmita Mitra and Jugal Kr Kalita. * Madhusudan Paul [email protected] 1



Department of Computer Science and Engineering, IIT Guwahati, Guwahati, India



Department of Computer and System Sciences, Visva-Bharati, Santiniketan, India

2

(e.g., yeast) is an association between gene products of the species and GO terms. Ontology-based semantic similarity measure (SSM) is a quantitative function, SSM(t1 , t2 ) , that measures the closeness between two terms t1 and t2 based on their semantic representations in a given ontology. Mathematically, it is a function of two ontology terms (or two sets of ontology terms) that returns a numeric value reflecting the closeness between them in the context of semantic meaning [33]. SSMs are originally defined in the study of linguistics. Lord et al. [23] did the first pioneering work by utilizing the ontology-based SSM i