A Scalable Multilevel Algorithm for Graph Clustering and Community Structure Detection

One of the most useful measures of cluster quality is the modularity of the partition, which measures the difference between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random (unstructured) gr

PDF / 441,128 Bytes
12 Pages / 430 x 660 pts Page_size
47 Downloads / 236 Views

DOWNLOAD

REPORT

Abstract. One of the most useful measures of cluster quality is the modularity of the partition, which measures the diﬀerence between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random (unstructured) graph. In this paper we show that the problem of ﬁnding a partition maximizing the modularity of a given graph G can be reduced to a minimum weighted cut problem on a complete graph with the same vertices as G. We then show that the resulted minimum cut problem can be eﬃciently solved with existing software for graph partitioning and that our algorithm ﬁnds clusterings of a better quality and much faster than the existing clustering algorithms.

1

Introduction

One way to analyze and understand the information contained in the huge amount of data available on the WWW and the relationships between the individual items is to organize them into ”communities,” maximal groups of related items. Determining the communities is of great theoretical and practical importance since they correspond to entities such as collaboration networks, online social networks, scientiﬁc publications or news stories on a given topic, related commercial items, etc. Communities also arise in other types of networks such as computer and communication networks (the Internet, ad-hoc networks) and biological networks (protein interaction networks, genetic networks). The problem of identifying communities in a network is usually modeled as a graph clustering (GC) problem, where vertices correspond to individual items and edges describe relationships. Then the communities correspond to clusters with a lot of edges between vertices belonging to the same subgraph (called incluster edges) and fewer edges between vertices from diﬀerent subgraphs (called between-cluster edges). The GC problem has been intensively studied in the recent years in relation to its applications in the analysis of networks. Girvan and Newman propose in [11], [18] algorithms based on the betweenness of the edges of a graph, a characteristic that measures the number of the shortest paths in a graph that use any given edge. In [15] Newman describes an algorithm based on a characteristic of clustering quality called modularity, a measure that takes

This work has been supported by the Department of Energy under contract W-705ENG-36.

W. Aiello et al. (Eds.): WAW 2006, LNCS 4936, pp. 117–128, 2008. c Springer-Verlag Berlin Heidelberg 2008

118

H.N. Djidjev

into account the number of in-cluster edges and the expected number of such edges. (We formally deﬁne and discuss modularity in more detail in the next section.) A faster version of the algorithm from [15] was described by Clauset et al. in [6]. Several algorithms have been proposed based on the eigenvectors of the graph Laplacian, e.g., [19], [16]. In all previous cases the algorithms reported in the literature are either not fast enough, or are inaccurate. In this paper we will describe a new approach for GC that uses our newly discovered relationship between the

Data Loading...

A Scalable Multilevel Algorithm for Graph Clustering and Community Structure Detection

Recommend Documents

An Approximation Algorithm for a Semi-supervised Graph Clustering Problem

Relationship Between Community Structure and Clustering Coefficient

A pyramidal community detection algorithm based on a generalization of the clustering coefficient

Adaptive multi-resolution graph-based clustering algorithm for electrofacies analysis

Community Detection Clustering via Gumbel Softmax

ComHapDet: a spatial community detection algorithm for haplotype assembly

Clique-Based Locally Consistent Latent Space Clustering for Community Detection

An Improved Genetic-Based Link Clustering for Overlapping Community Detection

Differentially Private Social Graph Publishing for Community Detection

Compressed graph representation for scalable molecular graph generation

COTILES: Leveraging Content and Structure for Evolutionary Community Detection

Community Detection Algorithm Using Hypergraph Modularity