Adaptive Determining for Optimal Cluster Number of K-Means Clustering Algorithm

The clustering accuracy of K-means algorithm highly depends on the initial number of clusters and it takes a long time when dealing with the large sample data with high dimension. To solve these problems, this paper proposes a method to reduce the dimensi

PDF / 1,840,293 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
71 Downloads / 169 Views

DOWNLOAD

REPORT

Adaptive Determining for Optimal Cluster Number of K-Means Clustering Algorithm Jiankai Sun, Zhong Li, Fengyuan Zou and Yunchu Yang

Abstract The clustering accuracy of K-means algorithm highly depends on the initial number of clusters and it takes a long time when dealing with the large sample data with high dimension. To solve these problems, this paper proposes a method to reduce the dimensionality for high dimensional data by multidimensional scaling transformation and designs a measure which can effectively evaluate the quality of nuclear clustering algorithm. Furthermore, an adaptive method to determine the optimal cluster number is presented. It firstly predicts the initial cluster number in a low-dimensional space by the tree clustering. Then, the optimal cluster number is gotten by the adaptive algorithm. Experiments show that this method has higher accuracy and stability.

Keywords Optimal cluster number Kernel cluster transformation Dimensionality reduction

Multidimensional scaling

59.1 Introduction Cluster analysis, also called as unsupervised learning, can divide the sample into several clusters without training. Its principle is to make the data correlation within groups higher than between groups. Therefore, it becomes a very active research topic in data mining and other related fields [1]. K-means cluster algorithm is a widely used iterative descent clustering algorithm. However, there are some shortcomings: (1) K-means algorithm has J. Sun Z. Li (&) F. Zou Y. Yang Department of Mathematical Sciences, Zhejiang Sci-Tech University, 310018 Hangzhou, China e-mail: [email protected]

W. Lu et al. (eds.), Proceedings of the 2012 International Conference on Information Technology and Software Engineering, Lecture Notes in Electrical Engineering 211, DOI: 10.1007/978-3-642-34522-7_59, Ó Springer-Verlag Berlin Heidelberg 2013

551

552

J. Sun et al.

good clustering effect for the data with circular or similar circular distribution, but it has poor result for the non-circular distribution data. (2) The quality of the K-means clustering results depends on the selection of the initial center and the initial cluster number. So far, many improved methods have been presented to solve these shortcomings. In order to determine the better initial cluster centers, Yuan and Zhou [2] used a sampling method to improve the quality of initial central points. Zalik [3] provided a recursive method to determine the best initial cluster centers and made the K-clustering algorithm to achieve the globally optimal convergence. Likas et al. [4, 5] presented the kernel clustering algorithm which is a good solution for the problem that the K-means clustering algorithm cannot deal with the non-circular distribution data. This kernel clustering method uses the Mercer kernel [6] to make the input sample map into a high dimensional feature space and cluster it in a feature space. So, nonlinear sample classification problem can be transformed into a linear classification problem. The kernel function, which is intro

Data Loading...

Adaptive Determining for Optimal Cluster Number of K-Means Clustering Algorithm

Recommend Documents

Adaptive multi-resolution graph-based clustering algorithm for electrofacies analysis

A Fuzzy Clustering Algorithm Based on Weighted Index and Optimization of Clustering Number

An improved density-based adaptive p -spectral clustering algorithm

cs-means : Determining optimal number of clusters based on a level-of-similarity

Dynamic Programming Integrated Differential Evolution Algorithm for Determining Optimal Policy of Reservoir

Kernel Evolutionary Algorithm for Clustering

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Improved analysis of spectral algorithm for clustering

Some Adaptive Clustering Algorithms

An entropy-based initialization method of K -means clustering on the optimal number of clusters

MRI brain tumor detection using optimal possibilistic fuzzy C-means clustering algorithm and adaptive k-nearest neighbor

SB-Spectral Clustering Algorithm for Clustering and Validating Sensor