Metalign: efficient alignment-based metagenomic profiling via containment min hash
- PDF / 1,683,340 Bytes
- 15 Pages / 595.276 x 793.701 pts Page_size
- 6 Downloads / 193 Views
SOFTWARE
Open Access
Metalign: efficient alignment-based metagenomic profiling via containment min hash Nathan LaPierre1*, Mohammed Alser2, Eleazar Eskin1,3,4, David Koslicki5,6,7*† and Serghei Mangul8*† * Correspondence: nathanl2012@ gmail.com; [email protected]; [email protected] † David Koslicki and Serghei Mangul contributed equally to this work. 1 Department of Computer Science, University of California, Los Angeles, CA 90095, USA 5 Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA 8 Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA 90089, USA Full list of author information is available at the end of the article
Abstract Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignmentbased metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets. Keywords: Metagenomics, Abundance estimation, Profiling, Alignment
Introduction Microorganisms are ubiquitous in almost every natural setting, including soil [1], ocean water [2], and the human body [3], and they play critical roles in the functioning of each of these systems [4, 5]. Traditional culture-based analysis of these microbes is confounded by the presence of many microorganisms that cannot be cultured in standard laboratory settings [4, 6]. Further, analysis of lab-cultured organisms fails to capture the complex community dynamics in real microbial ecosystems [4]. The field of metagenomics, or the analysis of whole microbial genomes recovered directly from their host environment via high-throughput sequencing, is vital to understanding microbial communities and their functions [4, 5]. Predicting the presence and relative abundance of taxa in a metagenomic sample (referred to as “taxonomic profiling”) is one of the primary means of analyzing a metagenomic sample [7, 8]. In comparison with metagenomic assembly, profiling is computationally simpler and more effective at identifying low-abundance organisms [8]. Metagenomic profiles can be obtained through read classification (where individual reads are assigned to taxa or organisms) or via the closely related technique of read © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and th
Data Loading...