A systematic comparison of chloroplast genome assembly tools
- PDF / 3,772,046 Bytes
- 21 Pages / 595 x 794 pts Page_size
- 6 Downloads / 177 Views
RESEARCH
Open Access
A systematic comparison of chloroplast genome assembly tools Jan A. Freudenthal1,2 , Simon Pfaff1,3 , Niklas Terhoeven1,2 , Arthur Korte1 , Markus J. Ankenbrand1,2,4* and Frank Förster1,3,5,6* *Correspondence: [email protected] ; [email protected] 1 Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, 97074 Würzburg, Germany 3 Department of Bioinformatics, University of Würzburg, Biozentrum, Am Hubland, 97074 Würzburg, Germany Full list of author information is available at the end of the article
Abstract Background: Chloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome. Results: The performance of different assembly tools has never been systematically compared. Here, we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes. Conclusions: We create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible. Keywords: Chloroplast, Genome, Assembly, Software, Benchmark Introduction General introduction and motivation
Chloroplasts are essential organelles present in the cells of plants and autotrophic protists, which enable the conversion of light energy into chemical energy via photosynthesis. They harbor their own prokaryotic type of ribosomes and a circular DNA genome that varies in size between 120 to 160 kbp [1]. Because of their small size, chloroplast genomes were one of the first targets for sequencing projects. The first chloroplast genome sequences were obtained in 1986 [2, 3]. These early efforts elucidated the general genome organi-
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source,
Data Loading...