Efficient implied alignment

  • PDF / 2,220,207 Bytes
  • 18 Pages / 595 x 794 pts Page_size
  • 101 Downloads / 211 Views

DOWNLOAD

REPORT


(2020) 21:296

METHODOLOGY ARTICLE

Open Access

Efficient implied alignment Alex J. Washburn* and Ward C. Wheeler *Correspondence: [email protected] Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, 10024-5192 New York, NY, USA

Abstract Background: Given a binary tree T of n leaves, each leaf labeled by a string of length at most k, and a binary string alignment function ⊗, an implied alignment can be generated to describe the alignment of a dynamic homology for T . This is done by first decorating each node of T with an alignment context using ⊗, in a post-order traversal, then, during a subsequent pre-order traversal, inferring on which edges insertion and deletion events occurred using those internal node decorations. Results: Previous descriptions of the implied alignment   algorithm suggest a technique of “back-propagation” with time complexity O k2 ∗ n2 . Here we describe an implied   alignment algorithm with complexity O k ∗ n2 . For well-behaved data, such as molecular sequences, the runtime approaches the best-case complexity of (k ∗ n). Conclusions: The reduction in the time complexity of the algorithm dramatically improves both its utility in generating multiple sequence alignments and its heuristic utility. Keywords: Dynamic homology, Implied alignment, Multiple string alignment, Phylogenetics, Sequence alignment, Tree alignment

Background Implied Alignment (IA) was proposed by [1] as an adjunct to Direct Optimization (DO) [2, 3] to be used in phylogenetic tree search to provide both verification and more rapid heuristic analysis. The method was originally implemented in later versions of MALIGN [4] and has been a component of POY [5–9] since its inception. A more formal description of the algorithm was presented in [6] and [2]. Although originally designed for alignmentfree phylogenetic analysis (dynamic homology, [10]), the procedure was first used as a stand-alone multiple sequence alignment (MSA) tool by [11] in their analysis of skink systematics. IA was originally described in the context of parsimony-based phylogenetic analysis and was later extended to probabilistic model-based approaches by [12] and its implementations were described by [9, 13]. Similar MSA approaches also based in probabilistic analysis have been described e.g by [14] and [15], and implemented in PRANK [16]. Whiting et al. found that IA was superior (in terms of tree optimality score) to other

© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in