A New Efficient Approach in Clustering Ensembles

Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which generates a new feature space from initial c

PDF / 227,628 Bytes
11 Pages / 430 x 660 pts Page_size
88 Downloads / 233 Views

DOWNLOAD

REPORT

Abstract. Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which generates a new feature space from initial clustering outputs. Multiple runs of an initial clustering algorithm like k-means generate a new feature space, which is significantly better than pure or normalized feature space. Therefore, running a simple clustering algorithm on generated feature space can obtain the final partition significantly better than pure data. In this method, we use a modification of k-means for initial clustering runs named as “Intelligent kmeans”, which is especially defined for clustering ensembles. The results of the proposed method are presented using both simple k-means and intelligent kmeans. Fast convergence and appropriate behavior are the most interesting points of the proposed method. Experimental results on real data sets show effectiveness of the proposed method. Keywords: Clustering ensemble, feature space, intelligent k-means, and initial points.

1 Introduction There is no clustering algorithm performing best for all data sets. Choosing a single clustering algorithm for each data set requires both expertise and insight. Therefore, instead of clustering algorithm, a cluster ensemble can be used [1, 2]. In order to integrate clustering ensembles in a robust and stable manner, one needs a diversity of component partitions for combination that usually obtained from several sources: 1) Using different clustering algorithms to produce partitions for combination [4]. 2) Changing initialization or other parameters of a clustering algorithm [3, 5]. 3) Using different features via feature extraction for subsequent clustering [1, 6, 7]. 4) Partitioning different subsets of the original data [8, 9, 10, 11, 12, 13]. All above introduced mechanisms try to produce more diversity by considering data from different aspects. The major hardship in clustering ensembles is consensus function and partitions combination algorithm to produce final partition, or in the other words finding a consensus partition from the output partitions of various clustering algorithms. The combination of multiple clustering can also be viewed as finding a median partition with respect to the given partitions, which is proven to be NP-complete [14]. H. Yin et al. (Eds.): IDEAL 2007, LNCS 4881, pp. 395–405, 2007. © Springer-Verlag Berlin Heidelberg 2007

396

J. Azimi, M. Abdoos, and M. Analoui

There are many type of consensus function such as Hypergraph partitioning [1, 6], Voting approach [5, 8, 15], Quadratic Mutual Information Algorithm [16] and Coassociation based functions [2, 17, 18]. In this paper, we propose a new consensus function in clustering ensembles, which is named Labeling Algorithm. Instead of using previous consensus functions to maintain the results of each k-means and then obtain the final partition using the consensus function, we generate a feature as a result of each k-means ru

Data Loading...

A New Efficient Approach in Clustering Ensembles

Recommend Documents

A New Robust Fuzzy Clustering Approach: DBKIFCM

A New Ensemble Clustering Approach for Effective Information Retrieval

WBTC: a new approach for efficient storage of genomic data

Forming Ensembles at Runtime: A Machine Learning Approach

A New Clustering-Based Approach for Target Tracking to Optimize Energy Consumption in Wireless Sensor Networks

A Novel Energy-Efficient and Distance-Based Clustering Approach for Wireless Sensor Networks

A clustering-based approach to vortex extraction

Astrophysics A New Approach

A New Automatic Query Expansion Approach Using Term Selection and Document Clustering

Textual Clustering: Towards a More Efficient Descriptors of Texts

An Efficient JAYA-Based Clustering Technique

A new approach in index theory