Clustering Social Networks Using Distance-Preserving Subgraphs

Cluster analysis describes the division of a dataset into subsets of related objects, which are usually disjoint. There is considerable variety among the different types of clustering algorithms. Some of these clustering algorithms represent the dataset a

  • PDF / 408,304 Bytes
  • 19 Pages / 439.36 x 666.15 pts Page_size
  • 51 Downloads / 227 Views

DOWNLOAD

REPORT


Clustering Social Networks Using Distance-Preserving Subgraphs Ronald Nussbaum, Abdol-Hossein Esfahanian, and Pang-Ning Tan

Abstract Cluster analysis describes the division of a dataset into subsets of related objects, which are usually disjoint. There is considerable variety among the different types of clustering algorithms. Some of these clustering algorithms represent the dataset as a graph, and use graph-based properties to generate the clusters. However, many graph properties have not been explored as the basis for a clustering algorithm. In graph theory, a subgraph of a graph is distance-preserving if the distances (lengths of shortest paths) between every pair of vertices in the subgraph are the same as the corresponding distances in the original graph. In this paper, we consider the question of finding proper distance-preserving subgraphs, and the problem of partitioning a simple graph into an arbitrary number of distance-preserving subgraphs for clustering purposes. We then present a clustering algorithm called DP-Cluster, based on the notion of distance-preserving subgraphs. We also introduce the concept of relaxation values to the distance-preserving subgraph finding heuristic embedded in DP-Cluster, and investigate this and other variations of the algorithm. One area of research that makes considerable use of graph theory is the analysis of social networks. For this reason we evaluate the performance of DP-Cluster on two realworld social network datasets.

14.1 Introduction Cluster analysis is a technique for partitioning a dataset into groups of similar objects. It is often used as an exploratory data analysis tool to determine the underlying structure of a data set. A recent area of application is in the realm of social networks, where the goal is to find tightly-knit groups of people, also

R. Nussbaum ()  A.-H. Esfahanian  P.-N. Tan Michigan State University, East Lansing, MI 48824-1226, USA e-mail: [email protected]; [email protected]; [email protected] T. Özyer et al. (eds.), The Influence of Technology on Social Network Analysis and Mining, Lecture Notes in Social Networks 6, DOI 10.1007/978-3-7091-1346-2__14, © Springer-Verlag Wien 2013

331

332

R. Nussbaum et al.

known as communities, based on their social relations. Communities are detected by partitioning the network into subgraphs in such a way that optimizes certain graph properties (e.g., minimizing the cut between clusters [9], or maximizing an edge-based modularity function [16]). In addition to these criteria, there are other graph invariants that are potentially applicable to clustering social networks. The motivation for our work is to explore the effect of alternative graph invariants on the process of community finding. One of the difficulties with cluster analysis is that there is no strict definition of what a cluster is. This makes it more difficult to conjecture what sort of concepts might be usable in creating a clustering algorithm. In a social network, a vertex usually represents a person or small group of perso