A cubic algorithm for the generalized rank median of three genomes
- PDF / 3,642,572 Bytes
- 16 Pages / 595.276 x 790.866 pts Page_size
- 33 Downloads / 158 Views
Algorithms for Molecular Biology Open Access
RESEARCH
A cubic algorithm for the generalized rank median of three genomes Leonid Chindelevitch1* , Sean La1 and Joao Meidanis2
Abstract Background: The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time O(nω ) , where ω ≤ 3 , with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix. Results: We define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes. Conclusions: We test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction. Keywords: Comparative genomics, Ancestral genome reconstruction, Phylogenetics, Rank distance Background The genome median problem consists of computing a genome M that minimizes the sum d(A, M) + d(B, M) + d(C, M) , where A, B, and C are three given genomes and d(·, ·) is a distance metric that measures how far apart two genomes are, and is commonly chosen to correlate with evolutionary time. In this paper, we present a polynomial-time algorithm for the computation of a median for the rank distance. We call it a generalized median because, despite attaining a lower bound on the best score with respect to the rank distance, it may not be a genome in all cases. However, we report on experiments that show that the median is genomic in the majority of the cases we examined, including real *Correspondence: [email protected] 1 School of Computing Science, Simon Fraser University, Burnaby, Canada Full list of author information is available at the end of the article
genomes and artificial genomes created by simulation, and when it is not, a genome close to the median can be found via an efficient post-processing heuristic. This result is a significant improvement on the first algorithm for generalized medians with respect to the rank distance [1], which
Data Loading...