A hybrid approach to speed-up the k -means clustering method
- PDF / 942,018 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 91 Downloads / 178 Views
ORIGINAL ARTICLE
A hybrid approach to speed-up the k-means clustering method T. Hitendra Sarma • P. Viswanath B. Eswara Reddy
•
Received: 14 June 2011 / Accepted: 16 January 2012 / Published online: 15 February 2012 Ó Springer-Verlag 2012
Abstract k-means clustering method is an iterative partition-based method which for finite data-sets converges to a solution in a finite time. The running time of this method grows linearly with respect to the size of the data-set. Many variants have been proposed to speed-up the conventional k-means clustering method. In this paper, we propose a prototype-based hybrid approach to speed-up the k-means clustering method. The proposed method, first partitions the data-set into small clusters (grouplets), which are of varying sizes. Each grouplet is represented by a prototype. Later, the set of prototypes is partitioned into k clusters using the modified k-means method. The modified k-means clustering method is similar to the conventional k-means method but it avoids empty clusters (the clusters to which no pattern is assigned) in the iterative process. In each cluster of prototypes, each prototype is replaced by its corresponding set of patterns (which formed the grouplet) to derive a partition of the data-set. Since this partition of the data-set can deviate from the partition obtained using the conventional k-means method over the entire data-set, a correcting step is proposed. Both theoretically and experimentally, the conventional k-means method and the
T. H. Sarma (&) P. Viswanath Department of Computer Science and Engineering, Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal 518501, Andhra Pradesh, India e-mail: [email protected] P. Viswanath e-mail: [email protected] B. E. Reddy Department of Computer Science and Engineering, JNTUA College of Engineering, Anantapur 515002, Andhra Pradesh, India e-mail: [email protected]
proposed hybrid method (augmented with the correcting step) are shown to yield the same result (provided, the initial k seed points are same). But, the proposed method is much faster than the conventional one. Experimentally, the proposed method is compared with the conventional method and the other recent methods that are proposed to speed-up the k-means method. Keywords k-means clustering method Prototypes Hybrid clustering Leaders clustering method
1 Introduction Data clustering is a process of identifying the natural grouping that exist in a given data-set, such that the patterns in the same cluster are more similar and the patterns in different clusters are less similar. It has been considered as an important tool in various applications like biology, marketing, information retrieval, remote sensing, statistics, pattern recognition, image processing, text mining, etc. [17, 20]. Clustering algorithms can be broadly divided into two groups, viz., hierarchical and partitional [20]. Hierarchical clustering algorithms recursively find nested clusters either in agglomerative (bottom-up) mode or in divisive (top-dow
Data Loading...