A Data Clustering Algorithm Using Cuckoo Search

In this paper, we present a novel algorithm for performing k-means clustering using cuckoo search. A pending problem of K-Means clustering algorithm is that the performance is affected by the original cluster centers. In this paper the K-Means algorithm i

  • PDF / 200,231 Bytes
  • 6 Pages / 439.37 x 666.142 pts Page_size
  • 58 Downloads / 234 Views

DOWNLOAD

REPORT


Abstract In this paper, we present a novel algorithm for performing k-means clustering using cuckoo search. A pending problem of K-Means clustering algorithm is that the performance is affected by the original cluster centers. In this paper the K-Means algorithm is improved by cuckoo search and the initial cluster centers are generated by cuckoo search. The experiments and comparisons with the classical K-Means algorithm indicate that the improved k-mean clustering algorithm has obvious advantages on execution time. Keywords Cuckoo search algorithm

 K-Means  Clustering  Levy flight

1 Introduction Clustering is the process of partitioning or grouping a given set of patterns into different clusters. This is done such that patterns in the same cluster are alike and patterns belonging to two different clusters are different. Clustering is a main task of exploratory data mining, and a common technique used in neural networks, AI, and statistics. Different algorithms for clustering have been proposed. K-means algorithm and its different variations are among those algorithms. The k-means method has been shown to be effective in producing good clustering results for many practical applications.

M. Zhao (&)  H. Tang Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, College of Computer Science and Technology, Beijing University of Technology, Beijing 100124, China e-mail: [email protected] M. Zhao  H. Tang  J. Guo  Y. Sun Beijing Key Laboratory of Intelligent Logistics System, Beijing Wuzi University, Beijing 101149, China © Springer Science+Business Media Singapore 2016 J.C. Hung et al. (eds.), Frontier Computing, Lecture Notes in Electrical Engineering 375, DOI 10.1007/978-981-10-0539-8_23

225

226

M. Zhao et al.

The k-means algorithm is well known for its efficiency in clustering large data sets and K-Means partitions data items into k clusters with each cluster which is represented by a single center point. K-Means clustering algorithm groups data items into a predefined number of clusters, based on Euclidean distance as similarity measure. The purpose of K-Means is to find k cluster centers. It starts with a random initial cluster centers and keeps reassigning the data items in the dataset to cluster centers based on the similarity between the data object and the cluster center. The reassignment procedure will not stop until a convergence criterion is met, for instance, the algorithm reaches the fixed iteration number or the cluster result does not change after a certain number of iterations [1, 2]. However, a direct algorithm of k-means method requires time proportional to the product of number of patterns and number of clusters per iteration. This is computationally very expensive especially for large datasets. It is necessary to employee some other global optimal searching algorithm for generating these initial cluster centers. The cuckoo search (CS) algorithm is a biologically-inspired algorithm motivated by a social analogy that can be used to find an optimal, or near opt

Data Loading...