DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

  • PDF / 603,445 Bytes
  • 12 Pages / 600.03 x 792 pts Page_size
  • 40 Downloads / 260 Views

DOWNLOAD

REPORT


DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach Alain B. Tchagang1 and Ahmed H. Tewfik2 1 Department

of Biomedical Engineering, Institute of Technology, University of Minnesota, 312 Church Street SE, Minneapolis, MN 55455, USA 2 Department of Electrical and Computer Engineering, Institute of Technology, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA Received 15 May 2005; Revised 5 October 2005; Accepted 1 December 2005 Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION

One of the major goals of gene expression data analysis is to uncover genetic pathways, that is, chains of genetic interactions. For example, a researcher may be interested in identifying the genes that contribute to a disease. This task is difficult because subgroups of genes display similar activation patterns only under certain experimental conditions. Genes that are coregulated or coexpressed under a subset of conditions will behave differently under other conditions. Finding genetic pathways may therefore benefit from identifying clusters of genes that are coexpressed under subsets of conditions as opposed to all conditions. Gene expression data is typically arranged in a data matrix, with rows corresponding to genes and columns corresponding to experimental conditions. Conditions can be different environmental conditions or different time points corresponding to one or more environmental conditions. The (n, m)th entry of the gene expression matrix represents the expression level of the gene corresponding to row n under the specific condition corresponding to column m. The numerical value of the entry is usually the logarithm of the relative amount of the mRNA of the gene under the specific condition. By simultaneously clustering the ro