Enhancing gene expression clustering analysis using tangent transformation
- PDF / 503,783 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 15 Downloads / 167 Views
ORIGINAL ARTICLE
Enhancing gene expression clustering analysis using tangent transformation Xin Xu
Received: 20 August 2011 / Accepted: 3 January 2012 / Published online: 21 January 2012 Springer-Verlag 2012
Abstract Even though extensive work has been done on clustering gene expression data, none existing algorithms evaluates gene expression coherence simultaneously by both regulation direction and relative proportion. As an example, density-based algorithms group genes with similar expression levels together and may separate genes whose expression levels have a large difference in value but vary in a fixed proportion relative to one another. In order to simultaneously measure profile coherence in regulation proportion as well as regulation direction, we propose a novel tangent transformation method. Experimental results indicate that our tangent transformation method has enhanced the gene expression clustering results significantly. Our tangent transformation method can be flexibly applied for either global clustering or biclustering, in either unsupervised or supervised scenario. Keywords
Clustering Gene expression analysis
1 Motivation We assume the regulation of genes is in perfect coherence in all or just a subset of conditions if it is (1) coherent in regulation direction and also (2) coherent in expression level change proportion. Extensive work on clustering gene expression data has been proposed. However, there has been no unified measurement to consider the two properties of regulation coherence simultaneously during clustering
X. Xu (&) Science and Technology on Information System Engineering Laboratory, Nanjing 210007, China e-mail: [email protected]
yet. Clustering methods such as CLICK [11] and GEMS [17] are density-based and cluster genes based on similarity in expression levels. Even though the tendency-based methods [3, 7, 15] ignore the constraint on gene expression level similarity, and identify genes whose expression levels rise and fall synchronously or oppositely in a correlated manner, they generally do not care whether genes in a cluster vary in a fixed proportion relative to one another. For illustration, let us denote the expression profiles of two genes g1 and g2 in space S by Pg1 ;S and Pg2 ;S ; where S can be either a global space (all conditions) or a subspace (a subset of conditions). Here, the conditions refer to either time points or independent samples. If g1 and g2 have perfect coherence in space S, then the proportion of expression level change of g1 to that of g2 across any two conditions in S is a constant s1. Equally speaking, the profiles of g1 and g2 could overlap with a shifting and scaling transformation: Pg1 ;S ¼ s1 Pg2 ;S þ s2 : It is a perfect positive coherence if s1 [ 0 and a perfect negative coherence if s1 \ 0. Of course, for a gene cluster of perfect coherence, the values of s1 and s2 vary for different gene pairs. As an example, consider the profiles of three genes g1, g2 and g3, on conditions c2, c4, c8 and c10 in Fig. 1. The profiles of g1 and
Data Loading...