PyClone-VI: scalable inference of clonal population structures using whole genome data

  • PDF / 1,610,232 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 9 Downloads / 116 Views

DOWNLOAD

REPORT


ETHODOLOGY ARTICLE

Open Access

PyClone‑VI: scalable inference of clonal population structures using whole genome data Sierra Gillis1 and Andrew Roth1,2,3* 

*Correspondence: [email protected] 3 Department of Pathology and Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver V6T 1Z7, Canada Full list of author information is available at the end of the article

Abstract  Background:  At diagnosis tumours are typically composed of a mixture of genomically distinct malignant cell populations. Bulk sequencing of tumour samples coupled with computational deconvolution can be used to identify these populations and study cancer evolution. Existing computational methods for populations deconvolution are slow and/or potentially inaccurate when applied to large datasets generated by whole genome sequencing data. Results:  We describe PyClone-VI, a computationally efficient Bayesian statistical method for inferring the clonal population structure of cancers. We demonstrate the utility of the method by analyzing data from 1717 patients from PCAWG study and 100 patients from the TRACERx study. Conclusions:  Our proposed method is 10–100× times faster than existing methods, while providing results which are as accurate. Software implementing our method is freely available https​://githu​b.com/Roth-Lab/pyclo​ne-vi. Keywords:  Cancer, Tumour heterogeneity, Cancer evolution, Bayesian statistics

Background Cancer is an evolutionary process driven by ongoing somatic mutation within the malignant cell population [1, 2]. The combination of mutation, drift, and selection lead to heterogeneity within the population of cancer cells. Identifying population structure and quantifying the amount of heterogeneity in tumours is an important problem which has been extensively studied [3–8]. High throughput sequencing (HTS) provides a powerful approach to solve the problem with both bulk and single cell approaches being employed. While single cell sequencing approaches can more accurately resolve clonal population structure, they are not widely available and have limitations both technical and due to cost. Using bulk sequencing to study heterogeneity thus remains the predominate approach, and methods for studying heterogeneity using bulk sequencing will become even more important as HTS is increasingly used in translational and clinical work [9–12]. Identifying population structure and quantifying heterogeneity from bulk sequencing data is a computationally challenging problem. The core issue is to deconvolve sequence © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicate