A survey on de novo assembly methods for single-molecular sequencing

  • PDF / 230,552 Bytes
  • 13 Pages / 595.276 x 785.197 pts Page_size
  • 4 Downloads / 189 Views

DOWNLOAD

REPORT


REVIEW A survey on de novo assembly methods for single-molecular sequencing Ying Chen, Chuan-Le Xiao* State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou 510275, China * Correspondence: [email protected] Received March 10, 2020; Revised May 17, 2020; Accepted June 13, 2020 Background: The single-molecular sequencing (SMS) is under rapid development and generating increasingly long and accurate sequences. De novo assembly of genomes from SMS sequences is a critical step for many genomic studies. To scale well with the developing trends of SMS, many de novo assemblers for SMS have been released. These assembly workflows can be categorized into two different kinds: the correction-and-assembly strategy and the assembly-and-correction strategy, both of which are gaining more and more attentions. Results: In this article we make a discussion on the characteristics of errors in SMS sequences. We then review the currently widely applied de novo assemblers for SMS sequences. We also describe computational methods relevant to de novo assembly, including the alignment methods and the error correction methods. Benchmarks are provided to analyze their performance on different datasets and to provide use guides on applying the computation methods. Conclusion: We make a detailed review on the latest development of de novo assembly and some relevant algorithms for SMS, including their rationales, solutions and results. Besides, we provide use guides on the algorithms based on their benchmark results. Finally we conclude the review by giving some developing trends of third generation sequencing (TGS).

Keywords: third generation sequencing; single-molecular real-time sequencing; sequence alignment; sequence error correction; de novo assembly Author summary: In this review, we focus on the error characteristics of SMS sequences and challenges for de novo assembly of SMS sequences. We then describe the latest de novo assembly workflows, including both the correction-andassembly and the assembly-and-correction assemblers. We also introduce some computation methods that are closely related to the de novo assembly, including sequence alignment and error correction methods. Benchmarks are provided to analyze their performance on different datasets and to provide use guides on applying the computation methods. We conclude the review by giving some developing trends of TGS.

INTRODUCTION Third generation sequencing (TGS), also called singlemolecule, real-time (SMRT) sequencing, including PacBio RS II platform developed by Pacific Biosciences (PacBio) and MinION platform developed by Oxford Nanopore, captures sequence information directory in the process of DNA molecule replication. TGS generates average read length in the order of 10 kb, which is much longer than next generation sequencing (NGS) read size (100–500 bp) and significantly improves the integrity and

continuity of de novo assembly of genomes. Furthermore, the weak effect of classical sequencing bias, such as GC content, allows for