Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes

Proteogenomic strategies aim to refine genome-wide annotations of protein coding features by using actual protein level observations. Most of the currently applied proteogenomic approaches include integrative analysis of multiple types of high-throughput

  • PDF / 466,004 Bytes
  • 10 Pages / 504.57 x 720 pts Page_size
  • 81 Downloads / 178 Views

DOWNLOAD

REPORT


Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes Dhirendra Kumar and Debasis Dash

Abstract

Proteogenomic strategies aim to refine genome-wide annotations of protein coding features by using actual protein level observations. Most of the currently applied proteogenomic approaches include integrative analysis of multiple types of high-throughput omics data, e.g., genomics, transcriptomics, proteomics, etc. Recent efforts towards creating a human proteome map were primarily targeted to experimentally detect at least one protein product for each gene in the genome and extensively utilized proteogenomic approaches. The 14 year long wait to get a draft human proteome map, after completion of similar efforts to sequence the genome, explains the huge complexity and technical hurdles of such efforts. Further, the integrative analysis of large-scale multi-omics datasets inherent to these studies becomes a major bottleneck to their success. However, recent developments of various analysis tools and pipelines dedicated to proteogenomics reduce both the time and complexity of such analysis. Here, we summarize notable approaches, studies, software developments and their potential applications towards eukaryotic genome annotation and clinical proteogenomics. Keywords

Shotgun proteomics • Peptide identification • RNA-Seq • HUPO • Genome annotation

1.1 D. Kumar • D. Dash (*) G.N. Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi 110025, India e-mail: [email protected]

Introduction

Biological systems are complex, self-replicable machineries of which major components are proteins. Understanding the dynamics of protein expression in these systems may lead to a better interpretation of the underlying mechanisms and

© Springer International Publishing Switzerland 2016 Á. Végvári (ed.), Proteogenomics, Advances in Experimental Medicine and Biology 926, DOI 10.1007/978-3-319-42316-6_1

1

2

the predictability of potential outcomes. However, the techniques for probing these proteome components are not completely unbiased, i.e., knowledge of each component of the proteome is necessary and prerequisite to probe their expression. These proteomic techniques are largely dependent on mass spectrometry (MS) based shotgun proteomics. Mass spectra, containing mass to charge ratios and intensities for peptides and their fragments are searched against a database of known proteins to identify the expressed proteins and their quantities (Eng et al. 2011). One of the limitations of this method lies in the database itself, against which the spectral data generated in MS are searched. A protein missing from the database cannot be probed for its expression, despite being present in the sample (Frank et al. 2007). Thus, for comprehensive proteome profiling, the search database should be complete. However, most of these databases are neither complete nor error free (Kumar et al. 2016b). Proteogenomic tech