Grid-Based Clustering Algorithm Based on Intersecting Partition and Density Estimation
In order to solve the problem that traditional grid-based clustering techniques lack of the capability of dealing with data of high dimensionality, we propose an intersecting grid partition method and a density estimation method. The partition method can
- PDF / 176,817 Bytes
- 10 Pages / 430 x 660 pts Page_size
- 14 Downloads / 203 Views
2
School of Information & Engineering, Zhengzhou University, Zhengzhou, 450052, China School of Electronic Information & Engineering, Xi’an Jiaotong University, Xi’an, 710049, China {bzqiu,iexlli}@zzu.edu.cn
Abstract. In order to solve the problem that traditional grid-based clustering techniques lack of the capability of dealing with data of high dimensionality, we propose an intersecting grid partition method and a density estimation method. The partition method can greatly reduce the number of grid cells generated in high dimensional data space and make the neighbor-searching easily. On basis of the two methods, we propose grid-based clustering algorithm (GCOD), which merges two intersecting grids according to density estimation. The algorithm requires only one parameter and the time complexity is linear to the size of the input data set or data dimension. The experimental results show that GCOD can discover arbitrary shapes of clusters and scale well. Keywords: Grid-based clustering, intersecting partition, density estimation, algorithm.
1
Introduction
Clustering has been an important field of data mining, which groups a large amount of data samples into several clusters, maximizing the similarity of data samples in the same cluster and minimizing the similarity between data samples in different clusters [1] . Currently, clustering algorithms can be categorized into partition-based, hierarchical, density-based, grid-based and model-based. Gridbased clustering algorithms are drawing great attention for its advantages of discovering clusters of different shapes and sizes with high efficiency. Traditional grid-based clustering approaches can be divided into two categories according to partition method: fix-up grid partition method and adaptive grid partition method. Certain clustering algorithms for example CLIQUE[2], STING [3], WaveCluster [4], DENCLUE [5], GDILC [6], shifting grid clustering[7], SCI [8]and Dclust[9] adopt fix-up grid partition technology; while some other clustering algorithms adopt adaptive grid partition method, for example MAFIA [10], OptiGrid [11], MMNG [12], DESCRY [13], CBCM [14], GCHL [15] and so on. A typical grid-based clustering algorithm using fix-up grid partition technology is CLIQUE, whose basic idea is to partition every dimension of data space T. Washio et al. (Eds.): PAKDD 2007 Workshops, LNAI 4819, pp. 368–377, 2007. c Springer-Verlag Berlin Heidelberg 2007
GCOD Based on Intersecting Partition and Density Estimation
369
into intervals with equal length, and then non-intersecting rectangular cells with equal size are formed. Because points fallen into the same grid are probably to belong to a cluster, they can be processed as one object. All the clustering processes are operated on these grid cells, so they have nothing to do with the size of input data, but the number of grid cells. However, fix-up grid partition clustering approach suffers problems as follows: (1) The position of fix-up grid may cause the loss of small clusters. If certain cluster is partitioned into several a
Data Loading...