ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

PDF / 1,602,601 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
54 Downloads / 265 Views

THEORETICAL ADVANCES

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high‑dimensional data Kavan Fatehi1 · Mohsen Rezvani2 · Mansoor Fateh2 Received: 1 January 2018 / Accepted: 30 March 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH. Keywords High-dimensional data · Subspace clustering · Cluster similarity

1 Introduction Clustering is defined as the process of grouping similar things into the same groups so that the members of the same cluster are more similar to each other than members in different clusters [7]. Clustering is used in a variety of fields such as biology, astronomy, physics, archaeology and marketing [8]. Traditional clustering algorithms compute the full feature space to cluster data based on their similarity. Recently, more complicated approaches dealing with data gathering and management result in creating datasets with higher dimensions. So, the traditional clustering is not * Mohsen Rezvani [email protected] Kavan Fatehi [email protected] Mansoor Fateh [email protected] 1

Yazd University, Yazd, Iran

Shahrood University of Technology, Shahrood, Iran

2

considered appropriate for clustering such complex datasets [22]. The curse of dimensionality is one of the main challenges of data clustering in high-dimensional datasets [14]. This challenge increases dimensional cardinality. In other words, the curse of dimensionality condemns all distances are the same, so the distance between data is not a convenient discriminator in high-dimensional data [13]. The d

Data Loading...

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Recommend Documents

iMass : an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity

Rough subspace-based clustering ensemble for categorical data

Online Multi-objective Subspace Clustering for Streaming Data

An improved density-based adaptive p -spectral clustering algorithm

Subspace Clustering Techniques

Adaptive multi-resolution graph-based clustering algorithm for electrofacies analysis

An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering

\(\ell ^{0}\) -Sparse Subspace Clustering

An Approximate Algorithm for Robust Adaptive Beamforming

An Adaptive Fuzzy Clustering Algorithm Based on Multi-threshold for Infrared Image Segmentation

An Adaptive Parameters Density Cluster Algorithm for Data Cleaning in Big Data

Data clustering using multivariant optimization algorithm