State of the art de novo assembly of human genomes from massively parallel sequencing data

PDF / 168,769 Bytes
7 Pages / 609.449 x 790.866 pts Page_size
88 Downloads / 252 Views

State of the art de novo assembly of human genomes from massively parallel sequencing data Yingrui Li,1 Yujie Hu,1,2 Lars Bolund1,3 and Jun Wang1,2* 1

BGI-Shenzhen, Shenzhen, Guangdong 518083, China The Graduate University of the Chinese Academy of Sciences, Beijing 100062, China 3 Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark; Danish Center for Translational Breast Cancer Research, Copenhagen, Denmark; Institute of Human Genetics, University of Aarhus, Denmark *Correspondence to: E-mail: [email protected] 2

Date received (in revised form): 17th March 2010

Abstract Recent studies in human genomes have demonstrated the use of de novo assemblies to identify genetic variations that are difficult for mapping-based approaches. Construction of multiple human genome assemblies is enabled by massively parallel sequencing, but a conventional bioinformatics solution is costly and slow, creating bottlenecks in the process. This review describes two public short-read de novo assembly applications that can handle human genomes, ABySS and SOAPdenovo. It also discusses the technical aspects and future challenges of human genome de novo assembly by short reads. Keywords: de novo assembly, de Bruijn graph, massively parallel sequencing

Introduction One of the important goals of bioinformatics is to decipher the genome DNA sequence of a species. The genome serves as the digital basis of any life science. Access to a reference genome sequence for a species significantly facilitates biological studies, as proven by all the genomics-guided research in the wake of the Human Genome Project.1 It is conventionally believed that when a reference genome is available, any following studies will take a mapping-based ‘re-sequencing’ approach aiming for variation detection, as seen in many projects of human genomics.2,3 Recent studies, however, suggest that assembly-based approaches have greater potential to detect a more complete set of genetic variations, especially novel sequences4 and structural variations,5 even in relatively well-studied human genomes. Thus, assembly of individual genomes has again been brought to the frontier of

bioinformatics. With multiple assembled individual genomes available, it would be very interesting to see how rearrangements of different length scales and individual-specific sequences are distributed in the populations. The size of the human genome constrained individual human assembly by conventional Sanger sequencing because of costs. Second-generation sequencing technology produces large amounts of data more affordably, but the intrinsic highthroughput and short-read-length present considerable challenges to bioinformatics because of the difficulties in handling the data structure and in applying an appropriate assembly algorithm. Although many short-read de novo assemblers have been developed,6 only two of them, ABySS7 and SOAPdenovo,8 are said to be capable of assembling human genomes de novo. This paper presents a review of the two software packages and discusses the

Data Loading...

State of the art de novo assembly of human genomes from massively parallel sequencing data

Recommend Documents

De Novo Sequencing of Nonribosomal Peptides

GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes

Peptide De Novo Sequencing with MS/MS

De novo sequencing, assembly and functional annotation of Armillaria borealis genome

Long-read sequencing and de novo genome assembly of marine medaka ( Oryzias melastigma )

A survey on de novo assembly methods for single-molecular sequencing

FPGA-Based Acceleration of De Novo Genome Assembly

De novo assembly of transcriptome dataset from leaves of Dryobalanops aromatica (Syn. Dryobalanops sumatrensis ) seedlin

De novo transcriptome assembly from the gonads of a scleractinian coral, Euphyllia ancora : molecular mechanisms underly

Sequencing of Complete Chloroplast Genomes

Sequencing the Genomes of Single Cells

DNA typing from skeletal remains: a comparison between capillary electrophoresis and massively parallel sequencing platf