Evaluation of assembly methods combining long-reads and short-reads to obtain Paenibacillus sp. R4 high-quality complete
- PDF / 3,266,280 Bytes
- 14 Pages / 595.276 x 790.866 pts Page_size
- 22 Downloads / 173 Views
ORIGINAL ARTICLE
Evaluation of assembly methods combining long‑reads and short‑reads to obtain Paenibacillus sp. R4 high‑quality complete genome Seung Chul Shin1 · Woong Choi2 · Junhyuck Lee2,3 · Hyo Jin Kim4,5 · Han‑Woo Kim2,3 Received: 6 August 2020 / Accepted: 7 October 2020 © King Abdulaziz City for Science and Technology 2020
Abstract We sequenced the Paenibacillus sp. R4 using Oxford Nanopore Technology (ONT), single molecule real-time (SMRT) technology from Pacific Biosciences (PacBio), and Illumina technologies to investigate the application of nanopore reads in de novo sequencing of bacterial genomes. We compared the differences in both genome sequences between genome assemblies using nanopore and PacBio reads and focused on the difference in the prediction of coding sequences. The results indicated that for more accurate predictions of open reading frames, contigs in the assemblies using only PacBio reads also needed to be corrected using short reads with high-quality bases, and repeat regions in genomes did not affect the increase of mispredicted coding sequences via genome polishing significantly. In assemblies using only nanopore reads, genome polishing was essential, but many repeat regions in genomes might increase the number of mispredicted coding sequences via genome polishing. The hybrid assembly combining the long reads and short reads represents the best result for coding sequence predictions in genome assemblies using nanopore reads. Keywords Hybrid assembly · Long-read sequencing · Oxford Nanopore technology · Paenibacillus sp.
Introduction
Electronic supplementary material The online version of this article (https://doi.org/10.1007/s13205-020-02474-0) contains supplementary material, which is available to authorized users. * Seung Chul Shin [email protected] * Han‑Woo Kim [email protected] 1
Division of Life Sciences, Korea Polar Research Institute (KOPRI), Inchon 21990, Republic of Korea
2
Unit of Polar Genomics, Korea Polar Research Institute (KOPRI), Inchon 21990, Republic of Korea
3
Department of Polar Sciences, University of Science and Technology, Inchon 21990, Republic of Korea
4
Graduate School of International Agricultural Technology, Seoul National University, Pyeongchang 25354, Republic of Korea
5
Institutes of Green Bio Science and Technology, Seoul National University, Pyeongchang 25354, Republic of Korea
The development of long-read (LR) sequencing or thirdgeneration sequencing methods is overcoming the early limitations of short-read sequencing accelerating their application in microbial genomics. Single molecule realtime (SMRT) technology from Pacific Bioscience (PacBio) is the representative sequencing technology used in LR sequencing (Eid et al. 2009) and has been used for complete genome sequencing of many bacterial strains (Chin et al. 2013). The accuracy of each base in raw sequencing reads is known to be nearly 85% (Ross et al. 2013). Recently, another LR technology, Oxford Nanopore Technology (ONT), emerged as a sequencing service and rese
Data Loading...