Evolutionary Genomics Statistical and Computational Methods, Volume

Together with early theoretical work in population genetics, the debate on sources of genetic makeup initiated by proponents of the neutral theory made a solid contribution to the spectacular growth in statistical methodologies for molecular evolution. Ev

  • PDF / 480,721 Bytes
  • 20 Pages / 504 x 720 pts Page_size
  • 45 Downloads / 203 Views

DOWNLOAD

REPORT


1. Introduction DNA sequencing has been extensively used in biology since the dideoxy-based Sanger method was invented in 1977 (1): after three decades, it is difficult to imagine modern biological sciences without sequencing technology, as it has become an indispensable tool for biology researchers in every field. Although the Sanger method played a vital role in sequencing the genomes of many model organisms including the human genome, the recently developed so-called next-generation sequencing (NGS) technologies have made sequencing both more versatile and more applicable. NGS provides sequencing at an unprecedentedly low price in a short amount of time. For example, the Illumina GA platform can

Maria Anisimova (ed.), Evolutionary Genomics: Statistical and Computational Methods, Volume 1, Methods in Molecular Biology, vol. 855, DOI 10.1007/978-1-61779-582-4_5, # Springer Science+Business Media, LLC 2012

155

156

H. Lee and H. Tang

Table 1 NGS specifications Throughput Throughput (Gbp/run)a (Mbp/h)a

Sequencing Platform chemistry

Read lengths (bp)

Paired-end insert size(s)

454

Pyrosequencing (10)

~400

3 kb, 8 kb, 20 kb

38.1

0.4

GA

SBS with reversible termination

35, 50, 75, 150

500–600 bp, 5 kb, 267.9 10 kb

~90

SOLiD

Sequencing by ligation

25, 35, 50

600 bp to 10 kb

~90

258.6

For the throughput calculation (50 bp  2), paired-end kit is used for SOLiD and (150 bp  2) pairedend kit is used for GA a

produce a 30-fold coverage of the human genome in a single experiment (Table 1). Because of the high throughput of NGS, compared to its predecessors, recent exploration of resequencing projects (2–5), which often serve as a crucial step in comparative genomics, takes advantage of one or more NGS platforms. However, NGS does not outperform the Sanger method in every way: it suffers both from short read lengths and a higher error rate. Most popular platforms, such as Illumina GA and AB SOLiD platforms, offer reads of only 25–150 nucleotides long—considerably shorter than the 800–1,000 nucleotides of the Sanger method. As a result, NGS genome projects have difficulty with assembly and repeat resolution, and often require much higher sequencing coverage than Sanger sequencing. To a certain extent, this issue is being alleviated with the increased ability of NGS methods to produce mate-pair or paired-end reads (Fig. 1a, b). Despite these weaknesses, NGS is still quite attractive, as it is capable of providing massive amounts of data at a much lower cost. The large amount of data makes various applications possible, such as ChIP-Seq (6), RNA-seq (7), methylome sequencing (8), and exome sequencing (9), on a whole-genome scale. It is important to mention that many of these applications were once based on array hybridization methods (i.e., microarrays) but NGS is now delivering results at finer resolution for these applications. NGS, without any doubt, delivers a high volume of data, but the versatility and applicability that NGS offers cannot be fully achieved without overcoming the resulting bioinfor