PyClone-VI: scalable inference of clonal population structures using whole genome data

PDF / 1,610,232 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
9 Downloads / 145 Views

ETHODOLOGY ARTICLE

Open Access

PyClone‑VI: scalable inference of clonal population structures using whole genome data Sierra Gillis1 and Andrew Roth1,2,3*

*Correspondence: [email protected] 3 Department of Pathology and Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver V6T 1Z7, Canada Full list of author information is available at the end of the article

Abstract Background: At diagnosis tumours are typically composed of a mixture of genomically distinct malignant cell populations. Bulk sequencing of tumour samples coupled with computational deconvolution can be used to identify these populations and study cancer evolution. Existing computational methods for populations deconvolution are slow and/or potentially inaccurate when applied to large datasets generated by whole genome sequencing data. Results: We describe PyClone-VI, a computationally efficient Bayesian statistical method for inferring the clonal population structure of cancers. We demonstrate the utility of the method by analyzing data from 1717 patients from PCAWG study and 100 patients from the TRACERx study. Conclusions: Our proposed method is 10–100× times faster than existing methods, while providing results which are as accurate. Software implementing our method is freely available https://github.com/Roth-Lab/pyclone-vi. Keywords: Cancer, Tumour heterogeneity, Cancer evolution, Bayesian statistics

Background Cancer is an evolutionary process driven by ongoing somatic mutation within the malignant cell population [1, 2]. The combination of mutation, drift, and selection lead to heterogeneity within the population of cancer cells. Identifying population structure and quantifying the amount of heterogeneity in tumours is an important problem which has been extensively studied [3–8]. High throughput sequencing (HTS) provides a powerful approach to solve the problem with both bulk and single cell approaches being employed. While single cell sequencing approaches can more accurately resolve clonal population structure, they are not widely available and have limitations both technical and due to cost. Using bulk sequencing to study heterogeneity thus remains the predominate approach, and methods for studying heterogeneity using bulk sequencing will become even more important as HTS is increasingly used in translational and clinical work [9–12]. Identifying population structure and quantifying heterogeneity from bulk sequencing data is a computationally challenging problem. The core issue is to deconvolve sequence © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicate

Data Loading...

PyClone-VI: scalable inference of clonal population structures using whole genome data

Recommend Documents

Whole Genome Assemble

Extraction of Mitochondrial Genome from Whole Genome Next Generation Sequencing Data and Unveiling of Forensically Relev

Whole-Genome Association Study (WGAS)

Minimal Genome Design Algorithms Using Whole-Cell Models

Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population

Identification of single nucleotide variants in the Moroccan population by whole-genome sequencing

Inference for a Population Proportion

Statistical Inference on Random Structures

Whole Genome Amplification Methods and Protocols

Scalable and Hierarchical Distributed Data Structures for Efficient Big Data Management

Prediction of antimicrobial resistance in clinical Campylobacter jejuni isolates from whole-genome sequencing data

Did dysploid waves follow the pulses of whole genome duplications?