GDPC: generalized density peaks clustering algorithm based on order similarity
- PDF / 1,992,641 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 49 Downloads / 245 Views
ORIGINAL ARTICLE
GDPC: generalized density peaks clustering algorithm based on order similarity Xiaofei Yang1,2 · Zhiling Cai1 · Ruijia Li1 · William Zhu1 Received: 11 June 2019 / Accepted: 6 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases. Keywords Clustering · Order similarity · Density · Density peak · Graph
1 Introduction Clustering, as an unsupervised learning method, is a fundamental data analysis tool to discover the hidden information in the real-word. It uses only sample attribute values to categorize sample points into different clusters with characteristics of a high intra-cluster similarity and a low intercluster similarity under some special criterions [1]. Clustering is widely applied to various fields, such as information retrieval, machine learning, data mining, pattern recognition and image processing [2]. Density-based clustering algorithms, such as DBSCAN [3], OPTICS [4], DBCLASD [5] and DENCLUE [6], have attracted much attention among researchers. These methods * William Zhu [email protected] Xiaofei Yang [email protected] 1
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
School of Science, Xi’an Polytechnic University, Xi’an, China
2
assume that cluster is a set of high density points separated by low density points. Although they are capable of discovering clusters of arbitrary shapes, they are sensitive to data with different densities. For example, in DBSCAN, there are usually many high density points in a dense region and few high density points in a sparse region. This may view more low density points as noises in a sparse region and lead to improper clustering for the whole dataset. Density peaks clustering (DPC) [7] is also a typical density-based algorithm. This algorithm can assign a low density point to a cluster peak, and obtain the final clusters. But
Data Loading...