cPEA: a parallel method to perform pathway enrichment analysis using multiple pathways databases

  • PDF / 1,021,449 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 77 Downloads / 178 Views

DOWNLOAD

REPORT


METHODOLOGIES AND APPLICATION

cPEA: a parallel method to perform pathway enrichment analysis using multiple pathways databases Giuseppe Agapito1

· Mario Cannataro2

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Genes/proteins are essential to activate or inhibit biological pathways both inside or outside the cells in each living organism. The key to understand the functional roles of genes/proteins is the deduction of the relationship between pathways and genes/proteins. To understand the role of genes/proteins in a biological context, we can use pathway enrichment analysis (PEA), an essential method in omics research, to identify the biological role of genes/proteins. A large number of PEA methods and tools are available; nevertheless, only a few can perform PEA exploiting information coming from multiple databases in the same analysis. Many of these databases were initially developed to use their pathway representation format, resulting in a heterogeneous collection of resources that are extremely difficult to combine and use. Soft computing enables approximate solutions for problems challenging to solve precisely, such as merging and integrating structured and unstructured data, or data from different databases. The integration and merging of biological pathways from diverse data sources are challenging due to the different pathway data representations used. The use of parallel preprocessing methods to deal with approximation and imprecision can contribute to integrate heterogeneous pathway data. We implemented an automatic methodology to perform PEA using pathways coming from different databases and a method to compute topological scores to rank enriched pathways. This methodology is available in a software framework called cross-pathway enrichment analysis. The obtained results show good performance in terms of execution times and reduced memory consumption, allowing to improve PEA by using pathways coming from different databases. Keywords Parallel computing · Statistical analysis · Pathway enrichment analysis · Gene expression · SNP

1 Introduction After the sequencing of the whole DNA (Collins et al. 2003) which took place a few decades ago, it would seem that only a small part of the DNA about 5% is coding, while the remaining portion of the DNA about 95% has not an agreed Communicated by V. Loia.

B

Giuseppe Agapito [email protected] Mario Cannataro [email protected] http://dsmc.unicz.it/personale/docente/mariocannataro

1

Department of Legal, Economic and Social Sciences, and Data Analytics Research Center, University “Magna Græcia” of Catanzaro, Catanzaro, Italy

2

Department of Medical and Surgical Sciences, and Data Analytics Research Center, University “Magna Græcia” of Catanzaro, Catanzaro, Italy

meaning (i.e. a role in the various biological processes). The sequencing of a complete genome has also been reached thanks to the development of high-throughput (HT) methodologies. HT assays such as microarrays and next-generation sequencing (NGS) produce vast amounts of data