A New Efficient Approach in Clustering Ensembles

Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which generates a new feature space from initial c

  • PDF / 227,628 Bytes
  • 11 Pages / 430 x 660 pts Page_size
  • 88 Downloads / 217 Views

DOWNLOAD

REPORT


Abstract. Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which generates a new feature space from initial clustering outputs. Multiple runs of an initial clustering algorithm like k-means generate a new feature space, which is significantly better than pure or normalized feature space. Therefore, running a simple clustering algorithm on generated feature space can obtain the final partition significantly better than pure data. In this method, we use a modification of k-means for initial clustering runs named as “Intelligent kmeans”, which is especially defined for clustering ensembles. The results of the proposed method are presented using both simple k-means and intelligent kmeans. Fast convergence and appropriate behavior are the most interesting points of the proposed method. Experimental results on real data sets show effectiveness of the proposed method. Keywords: Clustering ensemble, feature space, intelligent k-means, and initial points.

1 Introduction There is no clustering algorithm performing best for all data sets. Choosing a single clustering algorithm for each data set requires both expertise and insight. Therefore, instead of clustering algorithm, a cluster ensemble can be used [1, 2]. In order to integrate clustering ensembles in a robust and stable manner, one needs a diversity of component partitions for combination that usually obtained from several sources: 1) Using different clustering algorithms to produce partitions for combination [4]. 2) Changing initialization or other parameters of a clustering algorithm [3, 5]. 3) Using different features via feature extraction for subsequent clustering [1, 6, 7]. 4) Partitioning different subsets of the original data [8, 9, 10, 11, 12, 13]. All above introduced mechanisms try to produce more diversity by considering data from different aspects. The major hardship in clustering ensembles is consensus function and partitions combination algorithm to produce final partition, or in the other words finding a consensus partition from the output partitions of various clustering algorithms. The combination of multiple clustering can also be viewed as finding a median partition with respect to the given partitions, which is proven to be NP-complete [14]. H. Yin et al. (Eds.): IDEAL 2007, LNCS 4881, pp. 395–405, 2007. © Springer-Verlag Berlin Heidelberg 2007

396

J. Azimi, M. Abdoos, and M. Analoui

There are many type of consensus function such as Hypergraph partitioning [1, 6], Voting approach [5, 8, 15], Quadratic Mutual Information Algorithm [16] and Coassociation based functions [2, 17, 18]. In this paper, we propose a new consensus function in clustering ensembles, which is named Labeling Algorithm. Instead of using previous consensus functions to maintain the results of each k-means and then obtain the final partition using the consensus function, we generate a feature as a result of each k-means ru