A hybrid approach to speed-up the k -means clustering method

PDF / 942,018 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
91 Downloads / 198 Views

ORIGINAL ARTICLE

A hybrid approach to speed-up the k-means clustering method T. Hitendra Sarma • P. Viswanath B. Eswara Reddy

•

Received: 14 June 2011 / Accepted: 16 January 2012 / Published online: 15 February 2012 Ó Springer-Verlag 2012

Abstract k-means clustering method is an iterative partition-based method which for finite data-sets converges to a solution in a finite time. The running time of this method grows linearly with respect to the size of the data-set. Many variants have been proposed to speed-up the conventional k-means clustering method. In this paper, we propose a prototype-based hybrid approach to speed-up the k-means clustering method. The proposed method, first partitions the data-set into small clusters (grouplets), which are of varying sizes. Each grouplet is represented by a prototype. Later, the set of prototypes is partitioned into k clusters using the modified k-means method. The modified k-means clustering method is similar to the conventional k-means method but it avoids empty clusters (the clusters to which no pattern is assigned) in the iterative process. In each cluster of prototypes, each prototype is replaced by its corresponding set of patterns (which formed the grouplet) to derive a partition of the data-set. Since this partition of the data-set can deviate from the partition obtained using the conventional k-means method over the entire data-set, a correcting step is proposed. Both theoretically and experimentally, the conventional k-means method and the

T. H. Sarma (&) P. Viswanath Department of Computer Science and Engineering, Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal 518501, Andhra Pradesh, India e-mail: [email protected] P. Viswanath e-mail: [email protected] B. E. Reddy Department of Computer Science and Engineering, JNTUA College of Engineering, Anantapur 515002, Andhra Pradesh, India e-mail: [email protected]

proposed hybrid method (augmented with the correcting step) are shown to yield the same result (provided, the initial k seed points are same). But, the proposed method is much faster than the conventional one. Experimentally, the proposed method is compared with the conventional method and the other recent methods that are proposed to speed-up the k-means method. Keywords k-means clustering method Prototypes Hybrid clustering Leaders clustering method

1 Introduction Data clustering is a process of identifying the natural grouping that exist in a given data-set, such that the patterns in the same cluster are more similar and the patterns in different clusters are less similar. It has been considered as an important tool in various applications like biology, marketing, information retrieval, remote sensing, statistics, pattern recognition, image processing, text mining, etc. [17, 20]. Clustering algorithms can be broadly divided into two groups, viz., hierarchical and partitional [20]. Hierarchical clustering algorithms recursively find nested clusters either in agglomerative (bottom-up) mode or in divisive (top-dow

Data Loading...

A hybrid approach to speed-up the k -means clustering method

Recommend Documents

k-Means Clustering

A Novel MapReduce Based k-Means Clustering

Rethinking k -means clustering in the age of massive datasets: a constant-time approach

A Quantum-Inspired Genetic K-Means Algorithm for Gene Clustering

A Color Image Segmentation Method Based on Improved K-Means Clustering Algorithm

A novel prediction method of complex univariate time series based on k -means clustering

An entropy-based initialization method of K -means clustering on the optimal number of clusters

Road accident prediction and model interpretation using a hybrid K-means and random forest algorithm approach

Clustering Analysis of Extreme Temperature Based on K-means Algorithm

Using a Set of Triangle Inequalities to Accelerate K-means Clustering

K-Means Clustering for Features Arrangement in Metagenomic Data Visualization

Applying K-means Clustering and Genetic Algorithm for Solving MTSP