A methodology for automatic parameter-tuning and center selection in density-peak clustering methods

  • PDF / 705,436 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 21 Downloads / 192 Views

DOWNLOAD

REPORT


METHODOLOGIES AND APPLICATION

A methodology for automatic parameter-tuning and center selection in density-peak clustering methods José Carlos García-García1

· Ricardo García-Ródenas1

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The density-peak clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach. The method has the advantage of allowing non-convex clusters, and clusters of variable size and density, to be grouped together, but it also has some limitations, such as the visual location of centers and the parameter tuning. This paper describes an optimization-based methodology for automatic parameter/center selection applicable both to the DPC and to other algorithms derived from it. The objective function is an internal/external cluster validity index, and the decisions are the parameterization of the algorithm and the choice of centers. The internal validation measures lead to an automatic parameter-tuning process, and the external validation measures lead to the so-called optimal rules, which are a tool to bound the performance of a given algorithm from above on the set of parameterizations. A numerical experiment with real data was performed for the DPC and for the fuzzy weighted k-nearest neighbor (FKNN-DPC) which validates the automatic parameter-tuning methodology and demonstrates its efficiency compared to the state of the art. Keywords Density peaks clustering · Automatic parameter tuning · Optimal rules · Cluster validity index · Differential entropy

1 Introduction The study of clustering techniques is a very active area of research in machine learning. Clustering is widely applied in pattern recognition, bioinformatics and image processing. It is used to find a partition of the dataset based on similar features. These methods can be divided into hierarchical methods, partitioning methods, density-based methods, model-based methods, grid-based methods and soft computing methods, or a combination of these. Recently, Rodríguez and Laio (2014) described a new clustering method using a fast search of density peaks (DPC). This algorithm is based on the idea that cluster centers have higher density than their neighbors and also that they are at a Communicated by V. Loia.

B

José Carlos García-García [email protected] Ricardo García-Ródenas [email protected]

1

Departamento de Matemáticas, Escuela Superior de Informática, Universidad de Castilla-La Mancha, Paseo de la Universidad, 4, Ciudad Real 13071, Spain

relatively large distance from any points with higher density. Liu et al. (2018) note the following two essential advantages of DPC: 1. The algorithm is simple and efficient, and it can quickly find the high density peak points (cluster centers). 2. The DPC algorithm is suitable for cluster analysis of large-scale data because the data points are assigned to the clusters in a single round based on minimum nearest distance to cluster center. Wiwie et al. (2015) introduce the integrative clustering evaluation fra