mPartition: A Model-Based Method for Partitioning Alignments
- PDF / 2,002,440 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 22 Downloads / 256 Views
ORIGINAL ARTICLE
mPartition: A Model‑Based Method for Partitioning Alignments Thu Le Kim1,2 · Vinh Le Sy1 Received: 31 December 2019 / Accepted: 8 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Maximum likelihood (ML) analysis of nucleotide or amino-acid alignments is widely used to infer evolutionary relationships among species. Computing the likelihood of a phylogenetic tree from such alignments is a complicated task because the evolutionary processes typically vary across sites. A number of studies have shown that partitioning alignments into sub-alignments of sites, where each sub-alignment is analyzed using a different model of evolution (e.g., GTR + I + G), is a sensible strategy. Current partitioning methods group sites into subsets based on the inferred rates of evolution at the sites. However, these do not provide sufficient information to adequately reflect the substitution processes of characters at the sites. Moreover, the site rate-based methods group all invariant sites into one subset, potentially resulting in wrong phylogenetic trees. In this study, we propose a partitioning method, called mPartition, that combines not only the evolutionary rates but also substitution models at sites to partition alignments. Analyses of different partitioning methods on both real and simulated datasets showed that mPartition was better than the other partitioning methods tested. Notably, mPartition overcame the pitfall of grouping all invariant sites into one subset. Using mPartition may lead to increased accuracy of ML-based phylogenetic inference, especially for multiple loci or whole genome datasets. Keywords Alignment partitioning · Maximum likelihood phylogenetic inference · Substitution model · Site rate model
Background Phylogenetic inference is a powerful approach to study the evolutionary relationships among species. The maximum likelihood (ML) method is among the most popular approaches to infer phylogenetic trees from nucleotide and amino-acid sequences (Felsenstein 2003; Lemey et al. 2009). The accuracy of ML-based phylogenetic inference relies on a number of factors including the size of alignments (i.e., the number of sites and sequences), tree building methods (e.g., IQ-TREE or PhyML), and models of sequence evolution (e.g., GTR + I + G4 or HKY + G4). The advancement of sequencing technologies has created large datasets for inferring phylogenetic trees. Efficient ML methods have been Handling Editor: Arndt von Haeseler. * Vinh Le Sy [email protected] 1
University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi 10000, Vietnam
Hanoi University of Science and Technology, 1st Dai Co Viet, Hai Ba Trung, Hanoi 10000, Vietnam
2
developed to build phylogenetic trees from large datasets; these include PhyML (Guindon and Gascuel 2003), IQPNNI (Vinh and von Haeseler 2004), RAxML (Stamatakis 2015), and IQ-TREE (Minh et al. 2020). Using different models of evolution to analyze a given dataset might produce si
Data Loading...