SequelTools: a suite of tools for working with PacBio Sequel raw sequence data

  • PDF / 1,962,649 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 2 Downloads / 208 Views

DOWNLOAD

REPORT


Open Access

SOFTWARE

SequelTools: a suite of tools for working with PacBio Sequel raw sequence data David E. Hufnagel1,2*  , Matthew B. Hufford1 and Arun S. Seetharam3 *Correspondence: [email protected] 2 Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA 50010, USA Full list of author information is available at the end of the article

Abstract  Background:  PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data. Results:  Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent. Conclusions:  SequelTools is a program that provides the only free, fast, and easy-touse quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https​://githu​ b.com/ISUge​nomic​s/Seque​lTool​s. Keywords:  Genomics, Next-generation sequencing, Third-generation sequencing, PacBio, Sequel

Background The third-generation of sequencing is here and making tremendous impact in the field of genomics. The primary contenders in third-generation sequencing are Pacific Biosciences (PacBio) (Sequel, Sequel2) and Oxford Nanopore (MinION, GridION, and PromethION). These new sequencing platforms are undergoing active development and pushing boundaries in terms of total output, read length, sequencing time, cost reduction and read accuracy [1, 2]. Recently introduced PacBio Sequel/Sequel2 platforms, which rely on Single-Molecule Real Time (SMRT) sequencing technology, are one of the most widely used long-read sequencing approaches [2, 3]. In contrast to second-generation methodologies, PacBio provides longer length reads, in much less time, with greatly © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or oth