Using prior knowledge in the inference of gene association networks

  • PDF / 1,659,957 Bytes
  • 12 Pages / 595.224 x 790.955 pts Page_size
  • 14 Downloads / 188 Views

DOWNLOAD

REPORT


Using prior knowledge in the inference of gene association networks 1 · Belen 1· ´ ´ Vega-Marquez ´ Isabel A. Nepomuceno-Chamorro1 · Juan A. Nepomuceno1 · Jose´ Luis Galvan-Rojas 1 Cristina Rubio-Escudero

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Traditional computational techniques are recently being improved with the use of prior biological knowledge from openaccess repositories in the area of gene expression data analysis. In this work, we propose the use of prior knowledge as heuristic in an inference method of gene-gene associations from gene expression profiles. In this paper, we use Gene Ontology, which is an open-access ontology where genes are annotated using their biological functionality, as a source of prior knowledge together with a gene pairwise Gene-Ontology-based measure. The performance of our proposal has been compared to other benchmark methods for the inference of gene networks, outperforming in some cases and obtaining similar and competitive results in others, but with the advantage of providing simple and interpretable models, which is a desired feature for the Artificial Intelligence Health related models as stated by the European Union. Keywords Gene-gene association networks · Ontology · Semantic similarity measure · Information fusion · Microarray data analysis

1 Introduction The huge amount of data produced by the biotechnology techniques has grown exponentially in recent years [14]. Nowadays, the invention and application of Highthroughput technologies offers scientists from biology and biomedicine the opportunity to gain a better understanding on the behaviour of genes such as the identification of novel gene-gene association, gene expression patterns or gene candidates in disease [19]. The microarray technology has the capacity to monitor changes in RNA1 abundance for thousands of genes simultaneously, which can be represented as a numerical matrix after preprocessing steps well known as low-level microarray data analysis [28]. In this matrix, the rows correspond to genes, the columns to experimental conditions, and a value in the matrix is the expression value of a gene under a condition. In the field of 1 RNA:

RiboNucleic acid

 Isabel A. Nepomuceno-Chamorro

[email protected] 1

Dpto. Lenguajes y Sistemas Inform´aticos, Universidad de Sevilla, Sevilla, Spain

gene expression data analysis, novel strategies are required to handle the huge amount data and to infer knowledge as gene regulatory networks. To infer gene regulatory networks, the first step is to extract direct regulatory relationships between genes, i.e., gene-gene associations. The inference of genegene associations is based on the concept of guilt-byassociation: gene co-expression implies gene co-regulation, i.e., groups of genes that show similar expression profiles also show the same regulatory regime or functionality. Coexpression networks are typically generated using coexpression methods, where each pair of genes is analyzed using correlation statistics as pairwise similarit