OncoMiner: A Pipeline for Bioinformatics Analysis of Exonic Sequence Variants in Cancer

With recent developments in high-throughput sequencing technologies, whole exome sequencing (WES) data have become a rich source of information from which scientists can explore the overall mutational landscape in patients with various types of cancers. W

PDF / 436,415 Bytes
24 Pages / 439.36 x 666.15 pts Page_size
10 Downloads / 284 Views

DOWNLOAD

REPORT

Abstract With recent developments in high-throughput sequencing technologies, whole exome sequencing (WES) data have become a rich source of information from which scientists can explore the overall mutational landscape in patients with various types of cancers. We have developed the OncoMiner pipeline for mining WES data to identify exonic sequence variants, link them with associated research literature, visualize their genomic locations, and compare their occurrence frequencies among different groups of subjects. This pipeline, written in Python on an IBM High-Performance Cluster, HPC Version 3.2, is accessible at oncominer.utep.edu. It begins with taking all the identified missense mutations of an individual and translating the affected genes based on Genome Reference Consortium’s human genome build 37. After constructing a list of exonic sequence variants from the individual, OncoMiner uses PROVEAN scoring scheme to assess each variant’s functional consequences, followed by PubMed searches to link the variant to previous reports. Users can then select subjects to visualize their PROVEAN score profiles with Circos diagrams and to compare the proportions of variant occurrences between different groups using Fisher’s exact tests. As such statistical comparisons typically involve many hypothesis tests, options for multipletest corrections are included to control familywise error or false discovery rates. We have used OncoMiner to analyze variants of cancer-related genes in 14 samples

M.-Y. Leung () Department of Mathematical Sciences, Bioinformatics and Computational Science Programs, and Border Biomedical Research Center, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected] J.A. Knapka Bioinformatics Program, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected] A.E. Wagler Department of Mathematical Sciences, Computational Science Program, and Border Biomedical Research Center, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected] G. Rodriguez • R.A. Kirken Department of Biological Sciences and Border Biomedical Research Center, The University of Texas at El Paso, El Paso, TX, USA e-mail: [email protected]; [email protected] © Springer International Publishing Switzerland (outside the USA) 2016 K.-C. Wong (ed.), Big Data Analytics in Genomics, DOI 10.1007/978-3-319-41279-5_12

373

374

M.-Y. Leung et al.

taken from patients with cancer, six from cancer cell lines, and ten from normal individuals. Variants showing significant differences between the cancer and control groups are identified and experiments are being designed to elucidate their roles in cancer. Keywords Computational pipeline • Cancer research • Exome • Exonic sequence variants • Bioinformatics

1 Introduction Advances of next-generation sequencing technologies in the past few years have greatly facilitated research studies on many human diseases at the genomic level. In a genome, the collection of all protein-coding regions, known as exons, is called the exome. Althoug

Data Loading...

OncoMiner: A Pipeline for Bioinformatics Analysis of Exonic Sequence Variants in Cancer

Recommend Documents

Bioinformatics for DNA Sequence Analysis

An ant colony algorithm for multiple sequence alignment in bioinformatics

Bioinformatics Volume I: Data, Sequence Analysis, and Evolution

Bioinf-PHP: Bioinformatics Pipeline for Protein Homology and Phylogeny

A new pipeline for the recognition of universal expressions of multiple faces in a video sequence

KVarPredDB: a database for predicting pathogenicity of missense sequence variants of keratin genes associated with genod

Data Analysis and Bioinformatics

Sequence Analysis

Identification of Potential Key Genes Involved in Progression of Gastric Cancer Using Bioinformatics Analysis

Identification of candidate biomarkers and therapeutic drugs of colorectal cancer by integrated bioinformatics analysis

Competitive Analysis for Two Variants of Online Metric Matching Problem

Protein Variants Analysis and Characterization