GraphAligner: rapid and versatile sequence-to-graph alignment

  • PDF / 2,004,695 Bytes
  • 28 Pages / 595 x 794 pts Page_size
  • 40 Downloads / 133 Views

DOWNLOAD

REPORT


(2020) 21:253

SOFTWARE

Open Access

GraphAligner: rapid and versatile sequence-to-graph alignment Mikko Rautiainen1,2,3* *Correspondence: [email protected]; [email protected] 1 Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123 Saarbrücken, Germany 4 Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany Full list of author information is available at the end of the article

and Tobias Marschall4* Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools. Availability: Package manager: https://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner Keywords: Genome graphs, Sequence alignment, Pangenome, Error correction, Long reads

Background Graphs provide a natural way of expressing variation or uncertainty in a genome [1, 2]. They have been used for diverse applications such as genome assembly [3–5], error correction [6–8], short tandem repeat genotyping [9], structural variation genotyping [10], and reference-free haplotype reconstruction [11]. With the growing usage of graphs, methods for handling graphs efficiently are becoming a crucial requirement for many applications. Sequence alignment is one of the most fundamental operations in bioinformatics and necessary for a wide range of analyses. Aligning a sequence to a sequence is a well-studied problem with many highly optimized tools [12–15]. In contrast, aligning sequences to graphs is a newer field and practical tools only start to emerge, where most of the existing tools are specialized for one purpose such as error correction [6–8], or hybrid genome assembly [4]. The VG toolkit [16] provides a set of general-purpose tools to work with genome graphs. Although VG is capable of mapping long reads to graphs, it was tuned for aligning short reads, leading to slow runtimes for long read alignment. In summary, there is presently a lack of general-purpose tools for aligning long third-generation sequencing

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the art