OncoMiner: A Pipeline for Bioinformatics Analysis of Exonic Sequence Variants in Cancer

With recent developments in high-throughput sequencing technologies, whole exome sequencing (WES) data have become a rich source of information from which scientists can explore the overall mutational landscape in patients with various types of cancers. W

  • PDF / 436,415 Bytes
  • 24 Pages / 439.36 x 666.15 pts Page_size
  • 10 Downloads / 177 Views

DOWNLOAD

REPORT


Abstract With recent developments in high-throughput sequencing technologies, whole exome sequencing (WES) data have become a rich source of information from which scientists can explore the overall mutational landscape in patients with various types of cancers. We have developed the OncoMiner pipeline for mining WES data to identify exonic sequence variants, link them with associated research literature, visualize their genomic locations, and compare their occurrence frequencies among different groups of subjects. This pipeline, written in Python on an IBM High-Performance Cluster, HPC Version 3.2, is accessible at oncominer.utep.edu. It begins with taking all the identified missense mutations of an individual and translating the affected genes based on Genome Reference Consortium’s human genome build 37. After constructing a list of exonic sequence variants from the individual, OncoMiner uses PROVEAN scoring scheme to assess each variant’s functional consequences, followed by PubMed searches to link the variant to previous reports. Users can then select subjects to visualize their PROVEAN score profiles with Circos diagrams and to compare the proportions of variant occurrences between different groups using Fisher’s exact tests. As such statistical comparisons typically involve many hypothesis tests, options for multipletest corrections are included to control familywise error or false discovery rates. We have used OncoMiner to analyze variants of cancer-related genes in 14 samples

M.-Y. Leung () Department of Mathematical Sciences, Bioinformatics and Computational Science Programs, and Border Biomedical Research Center, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected] J.A. Knapka Bioinformatics Program, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected] A.E. Wagler Department of Mathematical Sciences, Computational Science Program, and Border Biomedical Research Center, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected] G. Rodriguez • R.A. Kirken Department of Biological Sciences and Border Biomedical Research Center, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected]; [email protected] © Springer International Publishing Switzerland (outside the USA) 2016 K.-C. Wong (ed.), Big Data Analytics in Genomics, DOI 10.1007/978-3-319-41279-5_12

373

374

M.-Y. Leung et al.

taken from patients with cancer, six from cancer cell lines, and ten from normal individuals. Variants showing significant differences between the cancer and control groups are identified and experiments are being designed to elucidate their roles in cancer. Keywords Computational pipeline • Cancer research • Exome • Exonic sequence variants • Bioinformatics

1 Introduction Advances of next-generation sequencing technologies in the past few years have greatly facilitated research studies on many human diseases at the genomic level. In a genome, the collection of all protein-coding regions, known as exons, is called the exome. Althoug