Graph Clustering Using Early-Stopped Random Walks
Very fast growth of empirical graphs demands clustering algorithms with nearly-linear time complexity. We propose a novel approach to clustering, based on random walks. The idea is to relax the standard spectral method and replace eigenvectors with vector
- PDF / 189,585 Bytes
- 13 Pages / 439.37 x 666.142 pts Page_size
- 44 Downloads / 205 Views
2
Kielce University of Technology, Kielce, Poland [email protected] Institute of Computer Science Polish Academy of Sciences, Warsaw, Poland
Abstract. Very fast growth of empirical graphs demands clustering algorithms with nearly-linear time complexity. We propose a novel approach to clustering, based on random walks. The idea is to relax the standard spectral method and replace eigenvectors with vectors obtained by running early-stopped random walks. We abandoned iterating the random walk algorithm to convergence but instead stopped it after the time that is short compared with the mixing time. The computed vectors constitute a local approximation of the leading eigenvectors. The algorithm performance is competitive to the traditional spectral solutions in terms of computational complexity. We empirically evaluate the proposed approach against other exact and approximate methods. Experimental results show that the use of the early stop procedure does not influence the quality of the clustering on the tested real world data sets. Keywords: Graph clustering
1
· Random walks · Convergence rate
Introduction
Graph partitioning is the problem of dividing a graph into groups, with dense connections within groups and only sparser connections between them. It is an ubiquitous technique which has applications in many fields of computer science and engineering, such as image segmentation [16], in the World Wide Web [7], in biochemical neural networks [24], and in bioinformatics for protein family classification [5] among others. Graph partitioning methods can also be categorized as either global or local optimizers. Global methods try to discover structure of the whole graph. In local partitioning, the goal is to find a group containing a given seed vertex. Hence, essentially, it is the task of finding a bipartition of the graph into two vertex sets. A class of algorithms widely used to detect clusters in graphs is based on spectral methods, [12]. Here, to partition a graph, the eigenvectors of a suitably chosen matrix are used. This matrix represents connectivity between vertices c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved K. Saeed and W. Homenda (Eds.): CISIM 2016, LNCS 9842, pp. 416–428, 2016. DOI: 10.1007/978-3-319-45378-1 37
Graph Clustering Using Early-Stopped Random Walks
417
to be grouped. The eigenvector associated to the second smallest eigenvalue of the matrix is called the Fiedler vector, named after Fiedler for his contributions to algebraic graph theory [6]. The Fiedler vector is one of the most important spectral characteristics, carrying information about structural properties of the graph. As spectral algorithms represent global partitioning methods, they are relatively slow and thus have been mostly superseded by faster local algorithms, see e.g. [13]. Moreover, global solutions are infeasible in cases of graphs with only partly known structure, as for example the WWW network. We propose a solut
Data Loading...