Bayesian copy number detection and association in large-scale studies

PDF / 3,614,677 Bytes
14 Pages / 595 x 791 pts Page_size
40 Downloads / 324 Views

RESEARCH ARTICLE

Open Access

Bayesian copy number detection and association in large-scale studies Stephen Cristiano1† , David McKean2† , Jacob Carey1† , Paige Bracci3 , Paul Brennan4 , Michael Chou5 , Mengmeng Du6 , Steven Gallinger7 , Michael G. Goggins8,9 , Manal M. Hassan10 , Rayjean J. Hung7 , Robert C. Kurtz11 , Donghui Li12 , Lingeng Lu13 , Rachel Neale14 , Sara Olson6 , Gloria Petersen15 , Kari G. Rabe15 , Jack Fu1 , Harvey Risch13 , Gary L. Rosner1,10 , Ingo Ruczinski1 , Alison P. Klein2,5,9* and Robert B. Scharpf1,2*

Abstract Background: Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods: We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results: Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions: Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases. Keywords: Pancreatic cancer, SNP array, Copy number variants, Genome-wide association, CNPBayes, Batch effects

*Correspondence: [email protected]; [email protected] † Stephen Cristiano, David McKean, and Jacob Carey contributed equally to this work. 2 Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA 5 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium

Data Loading...

Bayesian copy number detection and association in large-scale studies

Recommend Documents

Copy Number Variant (CNV)

Copy Number Variants Methods and Protocols

Association between complement 4 copy number variation and systemic lupus erythematosus: a meta-analysis

A study of normal copy number variations in Israeli population

Utility of amplification enhancers in low copy number DNA analysis

A survey of analysis software for array-comparative genomic hybridisation studies to detect copy number variation

Mitochondrial DNA copy number and incident atrial fibrillation

Copy number variation of the SELENBP1 gene in schizophrenia

Heterologous Protein Production in High Copy Number Vector Systems

Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms

Analysis of Low Copy Number DNA and Degraded DNA

Association Studies