Comparison of Four Methods for Inferring Additive Trees from Incomplete Dissimilarity Matrices
The problem of inference of an additive tree from an incomplete dissimilarity matrix is known to be very delicate. As a solution to this problem, it has been suggested either to estimate the missing entries of a given partial dissimilarity matrix prior to
- PDF / 997,305 Bytes
- 8 Pages / 439.37 x 666.142 pts Page_size
- 74 Downloads / 155 Views
1
Introduction
.Incomplete dissimilarity data can arise in a variety of practical situations. For example, this is often the case in molecular biology, and more precisely in phylogenetics, where an additive or a phylogenetic tree represents an intuitive model of species evolution. The presence of missing data in a distance or dissimilarity matrix among species or taxa can be due to the lack of biological material, imprecision of employed experimental methods, or to a combination of unpredictable factors. Unfortunately, the vast majority of the widely used additive tree fitting techniques, as for example the NeighborJoining (Saitou and Nei, 1987), Fitch (Felsenstein, 1997), or BioNJ (Gascuel, 1997) algorithms, cannot be launched unless a complete dissimilarity matrix is available. To solve this challenging problem, some methods have been recently proposed. There exist in the literature two types of methods, using either indirect or direct estimation of missing values, for inferring additive trees from incomplete dissimilarity matrices. The first type of methods, or indirect estimation, relies on the assessing missing cells prior to phylogenetic reconstruction using the properties of path-length matrices representing trees. An additive tree
K. Jajuga et al. (eds.), Classification, Clustering, and Data Analysis © Springer-Verlag Berlin Heidelberg 2002
372
can then be inferred from a complete dissimilarity matrix by means of any available tree-fitting algorithm. The second type of methods handling missing values, or direct estimation, consists of reconstructing a tree directly from an incomplete dissimilarity matrix by using a particular tree-building procedure. As far as the direct estimation techniques are concerned, I have to mention the work by De Soete (1984) and Landry et al. (1996), who showed how to infer additive trees from partial data using either the ultrametric inequality: d(i,j)
~
max{d(i,k);d(j,k)},for any i,jand k,
(1)
or the four-point condition (Buneman 1971): d(i,j)
+ d(k, l)
~
max{d(i, k)
+ d(j, l); d(i, I) + d(j, k)}, for
any i,j, kand l (2)
Using the properties of the ultrametric inequality and the four-point condition, one can fill out incomplete matrices; the missing cells can actually be estimated through the combinations of the available ones. As to the direct reconstruction, two tree-building algorithms allowing for missing cells in dissimilarity matrices have been recently proposed by different authors; the Triangle method of Gu{moche and Leclerc (2001), see also Guenoche and Grandcolas (1999), relies on a constructive approach, whereas the MW procedure of Makarenkov and Leclerc (1999) is based on a least-squares approximation. This paper aims first at introducing a new original method for direct reconstruction of additive trees from partial matrices. The second goal consists of proving the efficiency of the proposed method by comparing it to the Ultrametric and Additive indirect procedures, as well as to the Triangle direct reconstruction method. In order to compare the new
Data Loading...