Comparison of Four Methods for Inferring Additive Trees from Incomplete Dissimilarity Matrices

The problem of inference of an additive tree from an incomplete dissimilarity matrix is known to be very delicate. As a solution to this problem, it has been suggested either to estimate the missing entries of a given partial dissimilarity matrix prior to

PDF / 997,305 Bytes
8 Pages / 439.37 x 666.142 pts Page_size
74 Downloads / 163 Views

DOWNLOAD

REPORT

1

Introduction

.Incomplete dissimilarity data can arise in a variety of practical situations. For example, this is often the case in molecular biology, and more precisely in phylogenetics, where an additive or a phylogenetic tree represents an intuitive model of species evolution. The presence of missing data in a distance or dissimilarity matrix among species or taxa can be due to the lack of biological material, imprecision of employed experimental methods, or to a combination of unpredictable factors. Unfortunately, the vast majority of the widely used additive tree fitting techniques, as for example the NeighborJoining (Saitou and Nei, 1987), Fitch (Felsenstein, 1997), or BioNJ (Gascuel, 1997) algorithms, cannot be launched unless a complete dissimilarity matrix is available. To solve this challenging problem, some methods have been recently proposed. There exist in the literature two types of methods, using either indirect or direct estimation of missing values, for inferring additive trees from incomplete dissimilarity matrices. The first type of methods, or indirect estimation, relies on the assessing missing cells prior to phylogenetic reconstruction using the properties of path-length matrices representing trees. An additive tree

K. Jajuga et al. (eds.), Classification, Clustering, and Data Analysis © Springer-Verlag Berlin Heidelberg 2002

372

can then be inferred from a complete dissimilarity matrix by means of any available tree-fitting algorithm. The second type of methods handling missing values, or direct estimation, consists of reconstructing a tree directly from an incomplete dissimilarity matrix by using a particular tree-building procedure. As far as the direct estimation techniques are concerned, I have to mention the work by De Soete (1984) and Landry et al. (1996), who showed how to infer additive trees from partial data using either the ultrametric inequality: d(i,j)

~

max{d(i,k);d(j,k)},for any i,jand k,

(1)

or the four-point condition (Buneman 1971): d(i,j)

+ d(k, l)

~

max{d(i, k)

+ d(j, l); d(i, I) + d(j, k)}, for

any i,j, kand l (2)

Using the properties of the ultrametric inequality and the four-point condition, one can fill out incomplete matrices; the missing cells can actually be estimated through the combinations of the available ones. As to the direct reconstruction, two tree-building algorithms allowing for missing cells in dissimilarity matrices have been recently proposed by different authors; the Triangle method of Gu{moche and Leclerc (2001), see also Guenoche and Grandcolas (1999), relies on a constructive approach, whereas the MW procedure of Makarenkov and Leclerc (1999) is based on a least-squares approximation. This paper aims first at introducing a new original method for direct reconstruction of additive trees from partial matrices. The second goal consists of proving the efficiency of the proposed method by comparing it to the Ultrametric and Additive indirect procedures, as well as to the Triangle direct reconstruction method. In order to compare the new

Data Loading...

Comparison of Four Methods for Inferring Additive Trees from Incomplete Dissimilarity Matrices

Recommend Documents

A comprehensive comparison of four methods for extracting lipids from Arabidopsis tissues

Inferring from Samples

Four-Quark States from Functional Methods

Comparison of Four Invasive Methods for Diagnosis of Helicobacter pylori Infection: Fluorescence in situ Hybridization,

Comparison of four different reduction methods for anterior dislocation of the shoulder

Phylogenetic Trees Dissimilarity Measure Based on Strict Frequent Splits Set and Its Application for Clustering

Maximum Likelihood Under Incomplete Information: Toward a Comparison of Criteria

Comparison of Sensitivity and Timing of Early Signal Detection of Four Frequently Used Signal Detection Methods

Inferring Unstable Equilibrium Configurations from Observed Dynamics

Inferring Performance from Code: A Review

Numerical Methods for Structured Matrices and Applications The Georg

Identification of Structural Matrices of Shear Buildings Using Ambient Vibration Tests with Incomplete Measurements