A Survey of Computational Methods for Protein Function Prediction

Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. Automated protein function annotation or prediction is a prime problem for computational met

PDF / 1,221,456 Bytes
74 Pages / 439.36 x 666.15 pts Page_size
71 Downloads / 342 Views

DOWNLOAD

REPORT

Abstract Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. Automated protein function annotation or prediction is a prime problem for computational methods to tackle in the post-genomic era of big molecular data. While recent community-driven experiments demonstrate that the accuracy of function prediction methods has significantly improved, challenges remain. The latter are related to the different sources of data exploited to predict function, as well as different choices in representing and integrating heterogeneous data. Current methods predict function from a protein’s sequence, often in the context of evolutionary relationships, from a protein’s three-dimensional structure or specific patterns in the structure, from neighbors in a protein–protein interaction network, from microarray data, or a combination of these different types of data. Here we review these methods and the state of protein function prediction, emphasizing recent algorithmic developments, remaining challenges, and prospects for future research. Keywords Computational biology • Protein function prediction • Algorithms • Machine learning • Homology

A. Shehu () Department of Computer Science, Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA e-mail: [email protected] D. Barbará Department of Computer Science, George Mason University, Fairfax, VA 22030, USA e-mail: [email protected] K. Molloy LAAS-CNRS, 7, avenue du Colonel Roche, 31077 Toulouse, France e-mail: [email protected] © Springer International Publishing Switzerland 2016 K.-C. Wong (ed.), Big Data Analytics in Genomics, DOI 10.1007/978-3-319-41279-5_7

225

226

A. Shehu et al.

1 Introduction Molecular biology now finds itself in the era of big data. The focus of the field on high-throughout, automated wet-laboratory protocols has resulted in a vast amount of gene sequence, expression, interactions, and protein structure data [212]. In particular, due to the increasingly fast pace with which whole genomes can be sequenced, we are now faced with millions of protein products for which no functional information is readily available [39, 198]. The December 2015 release of the Universal Protein (UniProt) database [68] contains a little over 55:2 million sequences, less than 1 % of which have reliable and detailed annotations. The gap between unannotated and annotated gene/protein sequences has exceeded two orders of magnitude. Fundamental information is currently missing for 40 % of the protein sequences deposited in the National Center for Biotechnology Information (NCBI) database; around 32 % of the protein sequences in the comprehensive UniProtKB database are currently labeled “unknown.” The missing information includes coarse-grained, low-resolution information such as where protein products are expressed, meta-resolution information, such as what chemical pathways proteins participate in the living cell, and high-resolution information, such as what

Data Loading...

A Survey of Computational Methods for Protein Function Prediction

Recommend Documents

Protein Function Prediction for Omics Era

Computational Methods in Protein Evolution

ClusterM: a scalable algorithm for computational prediction of conserved protein complexes across multiple protein inter

Computational Methods and Function Theory Proceedings of a Conferenc

Machine Learning Configurations for Enhanced Human Protein Function Prediction Accuracy

Study on Random Walk-Based Protein Function Prediction Method

Application of Computational Intelligence Techniques in the Domain of Net Asset Value Prediction: A Survey

A computational model for the prediction of steel hardenability

Computational Methods for Drug Repurposing

Membrane Protein Structure and Function Characterization Methods and

Computational Methods for Molecular Imaging

A Survey of Robust Preconditioning Methods