The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

  • PDF / 915,921 Bytes
  • 11 Pages / 600 x 792 pts Page_size
  • 1 Downloads / 193 Views

DOWNLOAD

REPORT


The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis Xiongwu Wu Laboratory of Biophysical Chemistry, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA Email: [email protected]

Yidong Chen National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA Email: [email protected]

Bernard R. Brooks Laboratory of Biophysical Chemistry, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA Email: [email protected]

Yan A. Su Department of Pathology, Loyola University Medical Center, Maywood, IL 60153, USA Email: [email protected] Received 28 February 2003; Revised 25 July 2003 An unsupervised data clustering method, called the local maximum clustering (LMC) method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the K-mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999). Keywords and phrases: data cluster, clustering method, microarray, gene expression, classification, model data sets.

1.

INTRODUCTION

Data analysis is a key step in obtaining information from large-scale gene expression data. Many analysis methods and algorithms have been developed for the analysis of the gene expression matrix [1, 2, 3, 4, 5, 6, 7, 8, 9]. The clustering of genes for finding coregulated and functionally related groups is particularly interesting in cases where there is a complete set of organism’s genes. A reasonable hypothesis is that genes with similar expression profiles, that is, genes that are coexpressed, may have something in common in their regulatory mechanisms, that is, they may be coregulated. Therefore, by clustering together genes with similar expression profiles,

one can find groups of potentially coregulated genes and search for putative regulatory signals. So far, many clustering methods have been developed. They can be divided into two categories: supervised and unsupervised methods. This work focuses on unsupervised data clustering. Some widely used methods in this category are the hierarchic clustering method [6], the K-mean clustering method [10], and the self-organized map clustering method