Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

PDF / 2,893,839 Bytes
28 Pages / 439.37 x 666.142 pts Page_size
33 Downloads / 192 Views

Ensemble‑based clustering of large probabilistic graphs using neighborhood and distance metric learning Malihe Danesh1 · Morteza Dorrigiv1 · Farzin Yaghmaee1 Accepted: 3 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Graphs are commonly used to express the communication of various data. Faced with uncertain data, we have probabilistic graphs. As a fundamental problem of such graphs, clustering has many applications in analyzing uncertain data. In this paper, we propose a novel method based on ensemble clustering for large probabilistic graphs. To generate ensemble clusters, we develop a set of probable possible worlds of the initial probabilistic graph. Then, we present a probabilistic co-association matrix as a consensus function to integrate base clustering results. It relies on co-occurrences of node pairs based on the probability of the corresponding common cluster graphs. Also, we apply two improvements in the steps before and after of ensembles generation. In the before step, we append neighborhood information based on node features to the initial graph to achieve a more accurate estimation of the probability between the nodes. In the after step, we use supervised metric learning-based Mahalanobis distance to automatically learn a metric from ensemble clusters. It aims to gain crucial features of the base clustering results. We evaluate our work using five real-world datasets and three clustering evaluation metrics, namely the Dunn index, Davies–Bouldin index, and Silhouette coefficient. The results show the impressive performance of clustering large probabilistic graphs. Keywords Probabilistic graph · Ensemble clustering · Distance metric learning · Neighborhood

* Morteza Dorrigiv [email protected] Malihe Danesh [email protected] Farzin Yaghmaee [email protected] 1

Faculty of Electrical and Computer Engineering, Semnan University, Semnan, Iran

13

Vol.:(0123456789)

M. Danesh et al.

1 Introduction In the era of big data, data management systems often face uncertainty. Also, many applications are increasingly generating interdependent data, which may lead to uncertainty between the data. In these cases, probabilistic graphs provide a method to visualize these data, where uncertainty is assigned by probable value to each edge. In the probabilistic graphs, each edge is labeled by a probability of data uncertainty. For example, in the protein interaction network, an edge between two interactive proteins corresponds to the interaction obtained through a noise test. Thus, due to the limitations of the measurement model, it represents uncertainty in the form of a probability value. Besides, in social networks, the probability of the existence of an edge between two people may be used to model the truthiness of a relation or influence one over the other one. With the rising growth of applications with uncertain data, probabilistic graph mining has become increasingly essential. Hence, some mining problems have recently been studied for this pu

Data Loading...

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Recommend Documents

KPML: A Novel Probabilistic Perspective Kernel Mahalanobis Distance Metric Learning Model for Semi-supervised Clustering

Ground Metric Learning on Graphs

Dynamic Distance Hereditary Graphs Using Split Decomposition

DDNE: Discriminative Distance Metric Learning for Network Embedding

Clustering Social Networks Using Distance-Preserving Subgraphs

Joint Learning of Distance Metric and Kernel Classifier via Multiple Kernel Learning

Distance Domination in Graphs

Probabilistic and Distance-Aware Clustering Approach to Decrease the Effect of Hot Spot in WSN

k -Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint

Hybridization of K-means Clustering Using Different Distance Function to Find the Distance Among Dataset

Fixed Point Theory in Probabilistic Metric Spaces

Symbolic Clustering of Constrained Probabilistic Data