Gabriel graph-based connectivity and density for internal validity of clustering
- PDF / 2,127,650 Bytes
- 18 Pages / 595.276 x 790.866 pts Page_size
- 90 Downloads / 192 Views
REGULAR PAPER
Gabriel graph‑based connectivity and density for internal validity of clustering Fatima Boudane1 · Ali Berrichi1 Received: 9 August 2019 / Accepted: 12 May 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Clustering has an important role in data mining field. However, there is a large variety of clustering algorithms and each could generate quite different results depending on input parameters. In the research literature, several cluster validity indices have been proposed to evaluate clustering results and find the partition that best fits the input dataset. However, these validity indices may fail to achieve satisfactory results, especially in case of clusters with arbitrary shapes. In this paper, we propose a new cluster validity index for density-based, arbitrarily shaped clusters. Our new index is based on the density and connectivity relations extracted among the data points, based on the proximity graph, Gabriel graph. The incorporation of the connectivity and density relations allows achieving the best clustering results in the case of clusters with any shape, size or density. The experimental results on synthetic and real datasets, using the well-known neighborhood-based clustering (NBC) algorithm and the DBSCAN (density-based spatial clustering of applications with noise) algorithm, illustrate the superiority of the proposed index over some classical and recent indices and show its effectiveness for the evaluation of clustering algorithms and the selection of their appropriate parameters. Keywords Cluster validity index · Density · Connectivity · Arbitrarily shaped clusters · Gabriel graph
1 Introduction Clustering algorithms are widely used in many applications such as pattern recognition [1], biology [2], image processing [3] and computer security [4]. The goal of these algorithms is to organize a dataset into groups (clusters) of objects that are similar in the context of the defined problem. In other words, clustering algorithms aim to maximize the similarity within clusters and the dissimilarity between different clusters. However, each clustering algorithm can produce quite different results depending on input parameters. Indeed, the user is always confronted with the difficulty to select the best clustering algorithm and the best values of its input parameters. These problems can be addressed based on cluster validity analysis [5]. For example, the performance * Fatima Boudane [email protected] Ali Berrichi ali.berrichi@univ‑boumerdes.dz 1
LIMOSE Laboratory, Department of Computer Science, Faculty of Sciences, University M’hamed Bougara of Boumerdes, Boumerdès, Algeria
of the K-means algorithm [6] is very sensitive to the number of clusters as input parameter. Thus, for an effective clustering process, we should run the algorithm several times with different numbers of clusters. Then, the best clustering is selected among the obtained partitions using a cluster validity index (CVI). A CVI is a quality indicator of a clustering algorithm and its in
Data Loading...