Finding Dense Subgraphs in Relational Graphs

This paper considers the problem of finding large dense subgraphs in relational graphs, i.e., a set of graphs which share a common vertex set. We present an approximation algorithm for finding the densest common subgraph in a relational graph set based on

  • PDF / 322,032 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 51 Downloads / 242 Views

DOWNLOAD

REPORT


Abstract. This paper considers the problem of finding large dense subgraphs in relational graphs, i.e., a set of graphs which share a common vertex set. We present an approximation algorithm for finding the densest common subgraph in a relational graph set based on an extension of Charikar’s method for finding the densest subgraph in a single graph. We also present a simple greedy heuristic which can be implemented efficiently for analysis of larger graphs. We give graph dependent bounds on the quality of the solutions returned by our methods. Lastly, we show by empirical evaluation on several benchmark datasets that our method out-performs existing approaches.

1

Introduction

Finding dense subgraphs is a key subtask in many applications [see 21, forasurvey]. In many contexts, there exist several graphs encoding different relationships between the same set of actors. Then, a subset of actors having high degree of interconnections (dense) which recur in multiple graphs (frequent) often have a rich interpretation in the application domain. For example, dense recurrent subgraphs in multiple gene co-expression networks have been shown to correspond to known functional/transcriptional modules or protein complexes as well as phenotypespecific modules [12,24]. Several data-mining methods have addressed the problem of enumerating dense recurrent subgraphs [see,e.g., 6,12,13,24,26,31,33]. Most existing approaches enumerate all frequent quasicliques depending on parameters such as minimum support threshold and minimum relative density. This results in exponential growth of search space with increasing size of the returned subgraph, making the methods unsuitable for identifying large dense subgraphs in multiple graphs. The approach of [24] yields a non-convex cubic programming problem which is solved approximately using multi-stage convex relaxation [34] and used in the analysis of coexpression networks in order to identify small biologically relevant modules. [13] present a method to identify large dense subgraphs based on solving a Multiple Kernel Learning (MKL) problem [3,27] with precomputed kernels. On the other hand, there is a rich body of work on approximation algorithms which addresses the problem of finding the densest subgraph (DS) [7,11] and its size-constrained variants (DkS, DalkS, DamkS) [1,5,20] including greedy approaches [2,9,28], truncated power method [32], linear programming (LP) based methods [7,20] and semidefinite programming (SDP) based methods c Springer International Publishing Switzerland 2015  A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 641–654, 2015. DOI: 10.1007/978-3-319-23525-7 39

642

V. Jethava and N. Beerenwinkel

[8,29]. We note that given a relational graph set 1 , G := {G(m) = (V, E (m) )}M m=1 , , G or G having edge it is possible to construct integrated graphs, e.g., G ∪ ∩ ≥t   sets respectively, E (m) , E (m) or {e | supp(e, G) ≥ t}) 2 ; and identify dense subgraphs in the integrated graph using these methods. However, as noted by [16], a dense subgraph in the integr