Phylosymmetric Algebras: Mathematical Properties of a New Tool in Phylogenetics

  • PDF / 456,271 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 54 Downloads / 189 Views

DOWNLOAD

REPORT


Phylosymmetric Algebras: Mathematical Properties of a New Tool in Phylogenetics Michael Hendriksen1,2

· Julia A. Shore3

Received: 13 May 2020 / Accepted: 2 November 2020 © The Author(s) 2020

Abstract In phylogenetics, it is of interest for rate matrix sets to satisfy closure under matrix multiplication as this makes finding the set of corresponding transition matrices possible without having to compute matrix exponentials. It is also advantageous to have a small number of free parameters as this, in applications, will result in a reduction in computation time. We explore a method of building a rate matrix set from a rooted tree structure by assigning rates to internal tree nodes and states to the leaves, then defining the rate of change between two states as the rate assigned to the most recent common ancestor of those two states. We investigate the properties of these matrix sets from both a linear algebra and a graph theory perspective and show that any rate matrix set generated this way is closed under matrix multiplication. The consequences of setting two rates assigned to internal tree nodes to be equal are then considered. This methodology could be used to develop parameterised models of amino acid substitution which have a small number of parameters but convey biological meaning. Keywords Phylogenetic methods · Graph theory · Matrix algebras · Rate matrices · Matrix models · Rooted trees

1 Introduction Phylogenetics is the study of constructing phylogenetic trees that represent evolutionary history. Analysis of RNA, DNA and protein sequence data with the use of

Substantial parts of MH’s research were carried out at both WSU and HHU.

B

Michael Hendriksen [email protected]

1

Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia

2

Institut für Molekulare Evolution, Heinrich-Heine Universität, Düsseldorf, Germany

3

University of Tasmania, Churchill Avenue, Sandy Bay, TAS 7005, Australia 0123456789().: V,-vol

123

151

Page 2 of 17

M. Hendriksen, J. A. Shore

continuous time Markov chains to measure the frequency of occurrence of point mutations is commonly employed in this field. From a continuous time Markov chain, transitions matrices (whose matrix entries represent probabilities of a change of state for a set time period) and rate matrices (whose entries represent the rates of change between states) can be generated. Transition matrices in phylogenetics are typically classified as either empirical, where the transition probabilities are values which have been calculated by analysing sequence data, or parameterised, where transition probabilities are represented by free parameters which are chosen to fit data as needed (Yang 2014). Given that a parameterised transition matrix contains free parameters, it can be thought of as a set of transition matrices and such a set is often referred to as a model where the set of transition matrices is denoted by M and the set of corresponding rate matrices is denoted by Q. Parameterised models are oft