In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precisio

PDF / 1,168,824 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
89 Downloads / 206 Views

ESEARCH ARTICLE

Open Access

In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precision Jonathan Louis Golob1 and Samuel Schwartz Minot2*

*Correspondence: [email protected] 2 Microbiome Research Initiative, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, E4‑100, Seattle, WA 98109‑1024, USA Full list of author information is available at the end of the article

Abstract Background: High-throughput sequencing can establish the functional capacity of a microbial community by cataloging the protein-coding sequences (CDS) present in the metagenome of the community. The relative performance of different computational methods for identifying CDS from whole-genome shotgun sequencing is not fully established. Results: Here we present an automated benchmarking workflow, using synthetic shotgun sequencing reads for which we know the true CDS content of the underlying communities, to determine the relative performance (sensitivity, positive predictive value or PPV, and computational efficiency) of different metagenome analysis tools for extracting the CDS content of a microbial community. Assembly-based methods are limited by coverage depth, with poor sensitivity for CDS at 1.0 CDMean

(1)

Golob and Minot BMC Bioinformatics

(2020) 21:459

where STD is standard deviation of per-base coverage values. 4 Calculate initial score for a given query coming from a subject using the alignment bitscores to weight the relative possibilities for a given query, normalizing the scores to total to 1 for a given query. 5 Iteratively, until no further references are pruned or a maximum number of iterations is reached: (1) WEIGHTING and RENORMALIZING: The score of queries being from a subject from the prior iteration are weighted by the sum of scores for a given subject, and then renormalized to sum to 1 for each query. (2) PRUNING. Determine the maximum likelihood for each query. Prune away all other likelihoods for the query below a threshold. 6 Repeat filtering steps 2–3 using the set of deduplicated alignments resulting from step 4. Here are some examples: For reference A and reference B that both have some aligning query reads, if there is uneven depth for reference A but relatively even depth across reference B, then reference A is removed from the candidate list while reference B is kept as a candidate. If query read #1 aligns equally-well to reference A and reference C, but there is 2× more query read depth for reference A as compared to reference C across the entire sample, then reference C’s alignment is removed from the list of candidates for query read #1. A more detailed description of the method is available in Additional file 1. An interactive demonstration of our algorithm is available as a Jupyter notebook is available at https://github.com/FredHutch/FAMLI/blob/master/schematic/FAMLI-schematic-figur e-GB.ipynb Comparison of FAMLI to HUMAnN2, SPAdes, top hit, and top 20 Simulation of microbial communities

Synthetic microbial communit

Data Loading...

In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precisio

Recommend Documents

Benchmarking of Digital Forensic Tools

In Silico Tools for Gene Discovery

In Silico Designing of Vaccines: Methods, Tools, and Their Limitations

Path Control in Limits of Vehicle Handling: A Sensitivity Analysis

Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads

Sensitivity and Resolution Limits in Scanning Capacitance Microscopy

Detection of Weak Expression of SOLO DANCERS in the Male Germline Using CYCLIN-DEPENDENT KINASE A1 Coding Sequence

In silico analysis reveals interrelation of enriched pathways and genes in type 1 diabetes

Enhanced detection limits for radiokrypton analysis

ISLAND: in-silico proteins binding affinity prediction using sequence information

Emerging diagnostic tools for detection of COVID-19 and perspective

Modern Tools for Detection and Diagnosis of Plant Pathogens