Generating realistic null hypothesis of cancer mutational landscapes using SigProfilerSimulator

  • PDF / 1,067,348 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 91 Downloads / 184 Views

DOWNLOAD

REPORT


Open Access

SOFTWARE

Generating realistic null hypothesis of cancer mutational landscapes using SigProfilerSimulator Erik N. Bergstrom1,2, Mark Barnes1,2, Iñigo Martincorena3 and Ludmil B. Alexandrov1,2* 

*Correspondence: [email protected]. edu 1 Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA Full list of author information is available at the end of the article

Abstract  Background:  Performing a statistical test requires a null hypothesis. In cancer genomics, a key challenge is the fast generation of accurate somatic mutational landscapes that can be used as a realistic null hypothesis for making biological discoveries. Results:  Here we present SigProfilerSimulator, a powerful tool that is capable of simulating the mutational landscapes of thousands of cancer genomes at different resolutions within seconds. Applying SigProfilerSimulator to 2144 whole-genome sequenced cancers reveals: (i) that most doublet base substitutions are not due to two adjacent single base substitutions but likely occur as single genomic events; (ii) that an extended sequencing context of ± 2 bp is required to more completely capture the patterns of substitution mutational signatures in human cancer; (iii) information on false-positive discovery rate of commonly used bioinformatics tools for detecting driver genes. Conclusions:  SigProfilerSimulator’s breadth of features allows one to construct a tailored null hypothesis and use it for evaluating the accuracy of other bioinformatics tools or for downstream statistical analysis for biological discoveries. SigProfilerSimulator is freely available at https​://githu​b.com/Alexa​ndrov​Lab/SigPr​ofile​rSimu​lator​ with an extensive documentation at https​://osf.io/usxjz​/wiki/home/. Keywords:  Somatic mutations, Mutational patterns, Mutational signatures

Background Performing a statistical evaluation to determine whether an observation is seen by chance necessitates the construction of a null hypothesis corresponding with the expected default position. An observation is generally considered statistically significant if it reflects an unlikely outcome of the null hypothesis. In most practical applications, observations seen in less than 5% of outcomes from a null distribution are considered statistically significant. Large-scale computational analyses of cancer genomes use background mutational models to evaluate driver mutations [1–6], mutational signatures [7], and topographical accumulation of somatic mutations [8]. In almost all cases, a null hypothesis model of the © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the artic