Genomic Structural Variants Methods and Protocols

The completion of a consensus draft sequence for the human genome was the starting point for more thorough investigations of individual genome variation.  The development of array-based strategies made it possible to look at our genome in new ways an

  • PDF / 411,499 Bytes
  • 16 Pages / 504.57 x 720 pts Page_size
  • 68 Downloads / 184 Views

DOWNLOAD

REPORT


1. Introduction Massively parallel sequencing technologies have fundamentally changed the study of genetics and genomics. New instruments from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) generate millions of DNA sequence reads in a single run, enabling researchers to address questions with unprecedented speed (1). These next-generation sequencing (NGS) technologies make it feasible to sequence entire genomes to high levels of coverage in a matter of weeks. Indeed, the complete genomes of several individuals have been sequenced on new platforms (2–9), and ambitious efforts like the 1,000 Genomes Project (http:// www.1000genomes.org) aim to add thousands more, offering an unprecedented survey of DNA sequence variation in humans. Lars Feuk (ed.), Genomic Structural Variants: Methods and Protocols, Methods in Molecular Biology, vol. 838, DOI 10.1007/978-1-61779-507-7_18, © Springer Science+Business Media, LLC 2012

369

370

D.C. Koboldt et al.

NGS has enabled powerful new approaches for the detection of copy number variation (CNV) and structural variation (SV) in the human genome. Compared to array-based methods, NGS has demonstrated higher sensitivity, in terms of the types and sizes of variants that can be detected. Furthermore, sequencing enabled the precise definition of SV breakpoints, information that is critical for assessing functional impact and inferring likely mutational mechanisms of origin. Most current approaches to sequence-based SV detection extend seminal work by Volik et al. (10) and Raphael et al. (11). Their method, first presented in 2003, applied end-sequence profiling (ESP) of bacterial artificial chromosomes to map structural rearrangements in cancer cell lines. The ESP method requires sequencing both ends of a genomic fragment of known size (e.g., a 200-kb BAC insert) and then mapping the end-sequence pair to a reference sequence. Fragments overlapping SV events result in paired sequences that map to different parts of the reference genome, possibly another chromosome entirely. In 2005, Tuzun et al. (12) used this approach to systematically discover SVs in a human genome, reporting hundreds of intermediate-sized variants, including insertions, deletions, and inversions. Paired-end sequencing on NGS platforms has enabled detection of CNV and SV in the human genome at unprecedented scale and throughput, and at a substantially reduced cost. Korbel et al. (13) developed a paired-end mapping (PEM) approach for the Roche/454 platform and used it to fine-map more than 1,000 SVs in two human genomes. Campbell et al. (14) used Illumina pairedend sequencing to characterize genomic rearrangements in cancer cell lines. Massively parallel sequencing has since been employed to systematically characterize large-scale variation in individual (2, 6, 8, 13) and cancer (14–17) genomes. Although NGS platforms are well-suited to CNV and SV detection, analysis of NGS data presents substantial bioinformatics challenges due to the relatively short read lengths (36–250 bp)