GDPC: generalized density peaks clustering algorithm based on order similarity

PDF / 1,992,641 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
49 Downloads / 251 Views

ORIGINAL ARTICLE

GDPC: generalized density peaks clustering algorithm based on order similarity Xiaofei Yang1,2 · Zhiling Cai1 · Ruijia Li1 · William Zhu1 Received: 11 June 2019 / Accepted: 6 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases. Keywords Clustering · Order similarity · Density · Density peak · Graph

1 Introduction Clustering, as an unsupervised learning method, is a fundamental data analysis tool to discover the hidden information in the real-word. It uses only sample attribute values to categorize sample points into different clusters with characteristics of a high intra-cluster similarity and a low intercluster similarity under some special criterions [1]. Clustering is widely applied to various fields, such as information retrieval, machine learning, data mining, pattern recognition and image processing [2]. Density-based clustering algorithms, such as DBSCAN [3], OPTICS [4], DBCLASD [5] and DENCLUE [6], have attracted much attention among researchers. These methods * William Zhu [email protected] Xiaofei Yang [email protected] 1

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China

School of Science, Xi’an Polytechnic University, Xi’an, China

2

assume that cluster is a set of high density points separated by low density points. Although they are capable of discovering clusters of arbitrary shapes, they are sensitive to data with different densities. For example, in DBSCAN, there are usually many high density points in a dense region and few high density points in a sparse region. This may view more low density points as noises in a sparse region and lead to improper clustering for the whole dataset. Density peaks clustering (DPC) [7] is also a typical density-based algorithm. This algorithm can assign a low density point to a cluster peak, and obtain the final clusters. But

Data Loading...

GDPC: generalized density peaks clustering algorithm based on order similarity

Recommend Documents

GDPC: A GPU-Accelerated Density Peaks Clustering Algorithm

Privacy-Preserving and Outsourced Density Peaks Clustering Algorithm

Target Tracking Algorithm Based on Density Clustering

Density-based Clustering

An improved density-based adaptive p -spectral clustering algorithm

Grid-Based Clustering Algorithm Based on Intersecting Partition and Density Estimation

A survey of density based clustering algorithms

ESDBSCAN: Enhanced Shuffling Based Density Clustering

Clustering Algorithm Based on Territory Game in Wireless Sensor Networks

Evolutionary many-objective optimization algorithm based on angle and clustering

A Signal Sorting Algorithm Based on LOF De-Noised Clustering

A Spectral Clustering Algorithm Based on Hierarchical Method