An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction
- PDF / 1,565,481 Bytes
- 12 Pages / 595 x 794 pts Page_size
- 73 Downloads / 197 Views
RESEARCH
Open Access
An alignment-free heuristic for fast sequence comparisons with applications to phylogeny reconstruction Sriram P. Chockalingam2*
, Jodh Pannu1 , Sahar Hooshmand1 , Sharma V. Thankachan1 and Srinivas Aluru2,3
From 15th International Symposium on Bioinformatics Research and Applications (ISBRA’19) Barcelona, Spain. 3–6 June 2019 *Correspondence: [email protected] 2 Institute for Data Engineering and Science, Georiga Institute of Technology, 756 W Peachtree Street NW, Atlanta, USA Full list of author information is available at the end of the article
Abstract Background: Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk , have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACSk takes O(n logk n) time and hence impractical for large datasets, multiple heuristics that can approximate ACSk have been introduced. Results: In this paper, we present a novel linear-time heuristic to approximate ACSk , which is faster than computing the exact ACSk while being closer to the exact ACSk values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions: Our method produces a better approximation for ACSk and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs. Keywords: Alignment-free methods, Sequence comparison, Phylogeny reconstruction
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication
Data Loading...