PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes
- PDF / 1,333,735 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 41 Downloads / 164 Views
(2019) 15:50 Qu et al. Plant Methods https://doi.org/10.1186/s13007-019-0435-7
Open Access
SOFTWARE
PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes Xiao‑Jian Qu1,2, Michael J. Moore3, De‑Zhu Li1* and Ting‑Shuang Yi1*
Abstract Background: Plastome (plastid genome) sequences provide valuable information for understanding the phyloge‑ netic relationships and evolutionary history of plants. Although the rapid development of high-throughput sequenc‑ ing technology has led to an explosion of plastome sequences, annotation remains a significant bottleneck for plastomes. User-friendly batch annotation of multiple plastomes is an urgent need. Results: We introduce Plastid Genome Annotator (PGA), a standalone command line tool that can perform rapid, accurate, and flexible batch annotation of newly generated target plastomes based on well-annotated reference plastomes. In contrast to current existing tools, PGA uses reference plastomes as the query and unannotated target plastomes as the subject to locate genes, which we refer to as the reverse query-subject BLAST search approach. PGA accurately identifies gene and intron boundaries as well as intron loss. The program outputs GenBank-formatted files as well as a log file to assist users in verifying annotations. Comparisons against other available plastome annotation tools demonstrated the high annotation accuracy of PGA, with little or no post-annotation verification necessary. Likewise, we demonstrated the flexibility of reference plastomes within PGA by annotating the plastome of Rosa roxburghii using that of Amborella trichopoda as a reference. The program, user manual and example data sets are freely available at https://github.com/quxiaojian/PGA. Conclusions: PGA facilitates rapid, accurate, and flexible batch annotation of plastomes across plants. For projects in which multiple plastomes are generated, the time savings for high-quality plastome annotation are especially significant. Keywords: PGA, Plastome, Batch annotation, Accuracy, BLAST, Software, Algorithms Background The plastid genomes (plastomes) of most photosynthetic seed plants are highly conserved and have a quadripartite structure with a large and a small single-copy regions separated by two inverted repeat (IR) regions [1, 2]. The plastomes of photosynthetic seed plants are usually 120– 160 kb [1] in size and contain 101–118 unique genes [2]. Plastome sequences have been widely applied in phylogenetics [3–5], population genetics and phylogeography [6, 7], and comparative genomics [2, 8]. In addition, the plastome is a key target for genetic engineering efforts to
*Correspondence: [email protected]; [email protected] 1 Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming 650204, Yunnan, China Full list of author information is available at the end of the article
improve economic traits, resistance to diseases and pests, and stress resistance [9, 10]. The rapid development of high
Data Loading...