An exemplar-based clustering using efficient variational message passing

  • PDF / 2,460,767 Bytes
  • 42 Pages / 439.37 x 666.142 pts Page_size
  • 1 Downloads / 203 Views

DOWNLOAD

REPORT


An exemplar-based clustering using efficient variational message passing Mohamed Hamza Ibrahim1,2

· Rokia Missaoui1

Received: 7 February 2019 / Accepted: 21 October 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Clustering is a crucial step in scientific data analysis and engineering systems. Thus, an efficient cluster analysis method often remains a key challenge. In this paper, we introduce a general purpose exemplar-based clustering method called (MEGA), which performs a novel message-passing strategy based on variational expectation– maximization and generalized arc-consistency techniques. Unlike message passing clustering methods, MEGA formulates the message-passing schema as E- and M-steps of variational expectation–maximization based on a reparameterized factor graph. It also exploits an adaptive variant of generalized arc consistency technique to perform a variational mean-field approximation in E-step to minimize a Kullback–Leibler divergence on the model evidence. Dissimilar to density-based clustering methods, MEGA has no sensitivity to initial parameters. In contrast to partition-based clustering methods, MEGA does not require pre-specifying the number of clusters. We focus on the binary-variable factor graph to model the clustering problem but MEGA is applicable to other graphical models in general. Our experiments on real-world problems demonstrate the efficiency of MEGA over existing prominent clustering algorithms such as Affinity propagation, Agglomerative, DBSCAN, K-means, and EM. Keywords Affinity propagation · Message-passing · Generalized arc-consistency · Cluster analysis · Expectation–maximization

Responsible editor: Fei Wang

B

Mohamed Hamza Ibrahim [email protected] Rokia Missaoui [email protected]

1

Département d’informatique et d’ingénierie, Université du Québec en Outaouais, Gatineau, QC, Canada

2

Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt

123

Ibrahim et. al.

1 Introduction Cluster analysis aims at dividing objects into homogeneous groups (called clusters) based on their links or attributes such that the objects in each group are more cohesive to each other and more separable from objects in other groups. From an analytical perspective, it is a fundamental task that can be used for extracting representatives (also called exemplars or centroids) of objects and discovering their local structures. Thus, it often impacts a wide number of applications in different research communities, including data mining, social network analysis, information retrieval, unsupervised machine learning, pattern recognition, computer vision, and Bioinformatics. Consequently, many traditional clustering methods have been proposed [see Saxena et al. (2017) for a detailed survey], each of which is based on an induction approach and exploits valuable characteristics existing in objects and/or attributes to approximate cluster solutions that optimize certain objective function