Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads

  • PDF / 1,588,379 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 97 Downloads / 210 Views

DOWNLOAD

REPORT


SOFTWARE

Open Access

Natrix: a Snakemake‑based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads Marius Welzel1, Anja Lange2, Dominik Heider1, Michael Schwarz1, Bernd Freisleben1, Manfred Jensen3, Jens Boenigk3 and Daniela Beisser3* 

*Correspondence: daniela.beisser@uni‑due.de 3 Department of Biodiversity, University of Duisburg-Essen, Essen, Germany Full list of author information is available at the end of the article

Abstract  Background:  Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. Results:  We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https​://githu​ b.com/MW55/Natri​x) or as a Docker container on DockerHub (https​://hub.docke​ r.com/r/mw55/natri​x). Conclusion:  Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data. Keywords:  Amplicon sequencing, Operational Taxonomic Units, Amplicon Sequence Variants, Snakemake, Pipline, Illumina

Background Prokaryotes and microbial eukaryotes constitute a large fraction of the biodiversity on earth, but in many environments, their distribution and diversity are still unknown [21]. Sequencing of marker genes amplified from environmental samples can resolve some of © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a cre