Gabriel graph-based connectivity and density for internal validity of clustering

PDF / 2,127,650 Bytes
18 Pages / 595.276 x 790.866 pts Page_size
90 Downloads / 211 Views

REGULAR PAPER

Gabriel graph‑based connectivity and density for internal validity of clustering Fatima Boudane1 · Ali Berrichi1 Received: 9 August 2019 / Accepted: 12 May 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Clustering has an important role in data mining field. However, there is a large variety of clustering algorithms and each could generate quite different results depending on input parameters. In the research literature, several cluster validity indices have been proposed to evaluate clustering results and find the partition that best fits the input dataset. However, these validity indices may fail to achieve satisfactory results, especially in case of clusters with arbitrary shapes. In this paper, we propose a new cluster validity index for density-based, arbitrarily shaped clusters. Our new index is based on the density and connectivity relations extracted among the data points, based on the proximity graph, Gabriel graph. The incorporation of the connectivity and density relations allows achieving the best clustering results in the case of clusters with any shape, size or density. The experimental results on synthetic and real datasets, using the well-known neighborhood-based clustering (NBC) algorithm and the DBSCAN (density-based spatial clustering of applications with noise) algorithm, illustrate the superiority of the proposed index over some classical and recent indices and show its effectiveness for the evaluation of clustering algorithms and the selection of their appropriate parameters. Keywords Cluster validity index · Density · Connectivity · Arbitrarily shaped clusters · Gabriel graph

1 Introduction Clustering algorithms are widely used in many applications such as pattern recognition [1], biology [2], image processing [3] and computer security [4]. The goal of these algorithms is to organize a dataset into groups (clusters) of objects that are similar in the context of the defined problem. In other words, clustering algorithms aim to maximize the similarity within clusters and the dissimilarity between different clusters. However, each clustering algorithm can produce quite different results depending on input parameters. Indeed, the user is always confronted with the difficulty to select the best clustering algorithm and the best values of its input parameters. These problems can be addressed based on cluster validity analysis [5]. For example, the performance * Fatima Boudane [email protected] Ali Berrichi ali.berrichi@univ‑boumerdes.dz 1

LIMOSE Laboratory, Department of Computer Science, Faculty of Sciences, University M’hamed Bougara of Boumerdes, Boumerdès, Algeria

of the K-means algorithm [6] is very sensitive to the number of clusters as input parameter. Thus, for an effective clustering process, we should run the algorithm several times with different numbers of clusters. Then, the best clustering is selected among the obtained partitions using a cluster validity index (CVI). A CVI is a quality indicator of a clustering algorithm and its in

Data Loading...

Gabriel graph-based connectivity and density for internal validity of clustering

Recommend Documents

Internal and External Validity Issues

Internal Validity in Experiments for Typefaces for People with Dyslexia

Target Validity: Bringing Treatment of External Validity in Line with Internal Validity

Density-based Clustering

Gabriel Biel

A survey of density based clustering algorithms

Target Tracking Algorithm Based on Density Clustering

ESDBSCAN: Enhanced Shuffling Based Density Clustering

McDPC: multi-center density peak clustering

A New Fuzzy Clustering Validity Index with Strong Robustness

Privacy-Preserving and Outsourced Density Peaks Clustering Algorithm

Reliability and validity of a new scale on internal coherence (ICS) of cancer patients