Next-generation sequencing for virus detection: covering all the bases

  • PDF / 1,304,675 Bytes
  • 6 Pages / 595.276 x 790.866 pts Page_size
  • 13 Downloads / 240 Views

DOWNLOAD

REPORT


SHORT REPORT

Open Access

Next-generation sequencing for virus detection: covering all the bases Marike Visser1,2†, Rachelle Bester2†, Johan T. Burger2 and Hans J. Maree1,2*

Abstract Background: The use of next-generation sequencing has become an established method for virus detection. Efficient study design for accurate detection relies on the optimal amount of data representing a significant portion of a virus genome. Findings: In this study, genome coverage at different sequencing depths was determined for a number of viruses, viroids, hosts and sequencing library types, using both read-mapping and de novo assembly-based approaches. The results highlighted the strength of ribo-depleted RNA and sRNA in obtaining saturated genome coverage with the least amount of data, while even though the poly(A)-selected RNA yielded virus-derived reads, it was insufficient to cover the complete genome of a non-polyadenylated virus. The ribo-depleted RNA data also outperformed the sRNA data in terms of the percentage of coverage that could be obtained particularly with the de novo assembled contigs. Conclusion: Our results suggest the use of ribo-depleted RNA in a de novo assembly-based approach for the detection of single-stranded RNA viruses. Furthermore, we suggest that sequencing one million reads will provide sufficient genome coverage specifically for closterovirus detection. Keywords: CTV, Closterovirus, Genome coverage, GLRaV-3, Next-generation sequencing, Sequencing depth, Virus detection

Findings Next-generation sequencing (NGS) has proven to be a valuable tool for virus detection, discovery or diversity studies and has increased in popularity, while decreasing in cost. The percentage genome-wide coverage obtained, either through the mapping of reads or contigs (assembled reads) onto a reference genome, can serve as a form of virus detection. The confidence in a positive identification increases with greater coverage. Due to the variation in the number of reads associated with different genomic regions, an uneven coverage of the viral genome is often observed in RNA-Seq data. Variation in sequencing depth will, consequently, influence the percentage of genome coverage that can be obtained. It is therefore necessary to find the optimal amount of data needed to cover the complete or almost complete genome without generating an excess of sequence data. * Correspondence: [email protected] † Equal contributors 1 Agricultural Research Council, Infruitec-Nietvoorbij: Institute for Deciduous Fruit, Vines and Wine, Stellenbosch, South Africa 2 Department of Genetics, Stellenbosch University, Stellenbosch, South Africa

Therefore, the aim of this study was to illustrate the influence of sequencing depth on virus and viroid genome coverage, to provide a guideline for the number of reads required to offer maximum possible genome coverage. In this study two viruses from the family Closteroviridae, were selected. They were known variants of Grapevine leafroll-associated virus 3 (GLRaV-3) (variant group II) and Citrus tristeza vir