Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning

  • PDF / 507,009 Bytes
  • 8 Pages / 595.276 x 793.701 pts Page_size
  • 66 Downloads / 178 Views

DOWNLOAD

REPORT


METHODOLOGY ARTICLE

Open Access

Genome-wide SNP identification by highthroughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map Jean-Marc Celton1*, Alan Christoffels2, Daniel J Sargent3, Xiangming Xu3, D Jasper G Rees1,4

Abstract Background: Determining the position and order of contigs and scaffolds from a genome assembly within an organism’s genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. Results: The strategy was tested on a draft genome of the fungal pathogen Venturia inaequalis, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome Fragaria vesca. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for V. inaequalis and F. vesca, respectively, to genetic linkage maps. Conclusions: We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.

Background The recent introduction of Next Generation Sequencing platforms such as the Applied Biosystems SOLiD sequencer, the Roche (454) sequencer and the Illumina Genome Analyzer, has seen an exponential increase in genome sequencing efforts for a wide range of organisms. Over the last 2 years, a variety of genomes such as cow [1], papaya [2], cucumber [3] and the filamentous fungus Grosmannia clavigera [4], have been sequenced using these platforms. From the short overlapping sequence fragments obtained, it is possible to generate draft genome sequences using various algorithms developed for de novo sequence assembly [5-7]. Despite improvements in the software used in the assembly of small DNA sequences, it is very difficult to build a fully assembled genome using short read sequence data * Correspondence: [email protected] 1 Biotechnology Department, University of the Western Cape, Private Bag X17, Bellville 7535, South Africa Full list of author information is available at the end of the article

alone. The number of contiguous sequences in the final assembly can vary from tens, to several thousands depending on the accuracy of the primary sequence data, the depth of sequence coverage, the length and number of sequence repeats and the genome size of the organism studied. Various methods have been developed to position sequence scaffolds on physical or genetic maps to assist in the assembly process. Positional information for assemblies can, for instance, be derived from comparison with genomic sequences of related organisms. For relatively small genomes with limited numbers of sequence repeats, gaps between ge