Cistrome Data Browser and Toolkit: analyzing human and mouse genomic data using compendia of ChIP-seq and chromatin acce

  • PDF / 1,520,934 Bytes
  • 10 Pages / 595.276 x 785.197 pts Page_size
  • 5 Downloads / 171 Views

DOWNLOAD

REPORT


PROTOCOL AND TUTORIAL Cistrome Data Browser and Toolkit: analyzing human and mouse genomic data using compendia of ChIP-seq and chromatin accessibility data Rongbin Zheng1,2,†, Xin Dong1,2,†, Changxin Wan1,2, Xiaoying Shi1,2, Xiaoyan Zhang2,*, Clifford A. Meyer3,4,* 1

Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai 200433, China 2 Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China 3 Department of Data Science, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA 4 Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA 02215, USA * Correspondence: [email protected], [email protected] Received December 12, 2019; Revised January 14, 2020; Accepted January 21, 2020 The Cistrome Data Browser (DB) at the website (cistrome.org/db) provides about 56,000 published human and mouse ChIP-seq, DNase-seq, and ATAC-seq chromatin profiles, which we have processed using uniform analysis and quality control pipelines. The Cistrome DB Toolkit at the website (dbtoolkit.cistrome.org) was developed to allow users to investigate fundamental questions using this data collection. In this tutorial, we describe how to use the Cistrome DB to search for publicly available chromatin profiles, to assess sample quality, to access peak results, to visualize signal intensities, to explore DNA sequence motifs, and to identify putative target genes. We also describe the use of the Toolkit module to seek the factors most likely to regulate a gene of interest, the factors that bind to a given genomic interval (enhancer, SNP, etc.), and samples that have significant peak overlaps with user-defined peak sets. This tutorial guides biomedical researchers in the use of Cistrome DB resources to rapidly obtain valuable insights into gene regulatory questions

Keywords: ChIP-seq; chromatin accessibility; gene regulatory analysis; transcription factor

INTRODUCTION Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) is a widely used technique for studying genome-wide DNA-protein interactions and histone [1–3]. The DNase I hypersensitivity (DNase-seq) [4] and transposase-accessible chromatin (ATAC-seq) [5] assays facilitate genome-wide mapping of accessible chromatin, which reflects potential cis-regulatory elements bound by trans-acting factors [6]. ChIP-seq, DNase-seq, and ATAC-seq experiments are being carried out to acquire information about the complex biology of



gene regulation. The Encyclopedia of DNA Elements (ENCODE) Consortium [7] and NIH Roadmap Epigenomics Project [8] have generated many high-quality ChIP-seq samples, targeting various transcription factors (TF) and histone marks, as well as DNase-seq samples in many cell and tissue types. Besides these projects, a large quantity of ChIP-seq, DNase-seq, and ATAC-seq data has been deposited in the NCBI Gene Expression Omnibus (GEO) [9],