CELLO: a longitudinal data analysis toolbox untangling cancer evolution

  • PDF / 1,548,887 Bytes
  • 11 Pages / 595.276 x 785.197 pts Page_size
  • 118 Downloads / 191 Views

DOWNLOAD

REPORT


PROTOCOL AND TUTORIAL CELLO: a longitudinal data analysis toolbox untangling cancer evolution Biaobin Jiang1,†, Dong Song2,†, Quanhua Mu1, Jiguang Wang1,2,3,4,* 1

Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China 3 Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Hong Kong, China 4 State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Hong Kong, China * Correspondence: [email protected] 2

Received February 29, 2020; Revised June 8, 2020; Accepted July 9, 2020 The complex pattern of cancer evolution poses a huge challenge to precision oncology. Longitudinal sequencing of tumor samples allows us to monitor the dynamics of mutations that occurred during this clonal evolution process. Here, we present a versatile toolbox, namely CELLO (Cancer EvoLution for LOngitudinal data), accompanied with a step-by-step tutorial, to exemplify how to profile, analyze and visualize the dynamic change of somatic mutational landscape using longitudinal genomic sequencing data. Moreover, we customize the hypermutation detection module in CELLO to adapt targeted-DNA and whole-transcriptome sequencing data, and verify the extensive applicability of CELLO in published longitudinal datasets from brain, bladder and breast cancers. The entire tutorial and reusable programs in MATLAB, R and docker versions are open access at https://github.com/WangLabHKUST/CELLO.

Keywords: cancer evolution; genomics; longitudinal sequencing; bioinformatics

INTRODUCTION Targeting tumor-specific mutations via customized chemical compounds can precisely eradicate the cancer cells without harming healthy tissues, which paves a way toward precision oncology. But this precision oncology strategy has not been successful in many refractory cancers such as glioblastoma (GBM). One of the main obstacles is the limited understanding of cancer evolution, in which cancer cells might acquire advantageous fitness to revive under treatment stress. To study cancer evolution, researchers attempt to collect tumor samples from different locations (multiregional) and/or at different time points (longitudinal) of the same patients. However, the collection of such data is extremely challenging, partly due to tumor resectability. To overcome this difficulty, one way is to integrate data from multiple sources, which is able to increase statistical power, potentially leading to new discoveries hidden in large-scale public datasets. Recently, Wang et al. integrated longitudinal genomic data of GBM patients †

from six different sources [1], and this integration has revealed the pattern of GBM evolution under therapy and discovered several somatic mutations exclusively in the tumors after treatment. Here, we summarized the computational methods used in this paper [1], developed an easy-to-use toolbox, namely CELLO (Cancer EvoLution for LOngi