Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets

PDF / 1,016,656 Bytes
15 Pages / 595.276 x 793.701 pts Page_size
41 Downloads / 260 Views

RESEARCH ARTICLE

Open Access

Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets Yi Yue1,2,3*† , Hao Huang1,3,4† , Zhao Qi1,2†, Hui-Min Dou2, Xin-Yi Liu2, Tian-Fei Han1,4, Yue Chen1,4, Xiang-Jun Song1,4, You-Hua Zhang1,2,3* and Jian Tu1,2,4* * Correspondence: [email protected]. cn; [email protected]; [email protected] † Yi Yue, Hao Huang and Zhao Qi contributed equally to this work. 1 Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei 230036, China Full list of author information is available at the end of the article

Abstract Background: Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major challenge in metagenomic research. Both supervised and unsupervised machine learning methods have been employed in binning. Genome binning belonging to unsupervised method clusters contigs into individual genome bins by machine learning methods without the assistance of any reference databases. So far a lot of genome binning tools have emerged. Evaluating these genome tools is of great significance to microbiological research. In this study, we evaluate 15 genome binning tools containing 12 original binning tools and 3 refining binning tools by comparing the performance of these tools on chicken gut metagenomic datasets and the first CAMI challenge datasets. Results: For chicken gut metagenomic datasets, original genome binner MetaBat, Groopm2 and Autometa performed better than other original binner, and MetaWrap combined the binning results of them generated the most high-quality genome bins. For CAMI datasets, Groopm2 achieved the highest purity (> 0.9) with good completeness (> 0.8), and reconstructed the most high-quality genome bins among original genome binners. Compared with Groopm2, MetaBat2 had similar performance with higher completeness and lower purity. Genome refining binners DASTool predicated the most high-quality genome bins among all genomes binners. Most genome binner performed well for unique strains. Nonetheless, reconstructing common strains still is a substantial challenge for all genome binner. Conclusions: In conclusion, we tested a set of currently available, state-of-the-art metagenomics hybrid binning tools and provided a guide for selecting tools for metagenomic binning by comparing range of purity, completeness, adjusted rand index, and the number of high-quality reconstructed bins. Furthermore, available information for future binning strategy were concluded. Keywords: Metagenomics, Genome binning, Clustering, Benchmarking, Comparison

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium

Data Loading...

Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets

Recommend Documents

Real and Synthetic Test Datasets

Datasets and Dataflows

Good Datasets

Datasets and Data Preparation

Jupyter Notebooks and Public Datasets

Mining Spatio-Temporal Datasets

Understanding Data Sources and Datasets

LoRAS: an oversampling approach for imbalanced datasets

Unsupervised Learning on Document Datasets

Semantic Segmentation Datasets for Resource Constrained Training

A novel framework for generating handwritten datasets

Introduction to Common Crawl Datasets