Mining explainable local and global subgraph patterns with surprising densities

  • PDF / 2,177,521 Bytes
  • 51 Pages / 439.37 x 666.142 pts Page_size
  • 11 Downloads / 127 Views

DOWNLOAD

REPORT


Mining explainable local and global subgraph patterns with surprising densities Junning Deng1

· Bo Kang1 · Jefrey Lijffijt1 · Tijl De Bie1

Received: 13 February 2020 / Accepted: 21 October 2020 © The Author(s) 2020

Abstract The connectivity structure of graphs is typically related to the attributes of the vertices. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form ‘the subgroup of all individuals with certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y’, ideally relative to their expected connectivity. Such rules present potentially actionable and generalizable insight into the graph. Prior work has already considered the search for dense subgraphs (‘communities’) with homogeneous attributes. The first contribution in this paper is to generalize this type of pattern to densities between a pair of subgroups, as well as between all pairs from a set of subgroups that partition the vertices. Second, we develop a novel information-theoretic approach for quantifying the subjective interestingness of such patterns, by contrasting them with prior information an analyst may have about the graph’s connectivity. We demonstrate empirically that in the special case of dense subgraphs, this approach yields results that are superior to the state-of-the-art. Finally, we propose algorithms for efficiently finding interesting patterns of these different types.

Responsible editor: M. J. Zaki

B

Junning Deng [email protected] Bo Kang [email protected] Jefrey Lijffijt [email protected] Tijl De Bie [email protected]

1

IDLab, Ghent University, Technologiepark-Zwijnaarde 122, Ghent, Belgium

123

J. Deng et al.

Keywords Graph mining · Graph summarization · Graph clustering · Community detection · Subjective interestingness · Subgroup discovery

1 Introduction Real-life graphs (also known as networks) often contain attributes for the vertices. In social networks for example, where vertices correspond to individuals, vertex attributes can include the individuals’ interests, education, residency, and more. The connectivity of the network is usually highly related to those attributes (Fond and Neville 2010; McPherson et al. 2001; Aral et al. 2009; Li et al. 2017). The attributes of individuals affect the likelihood of them meeting in the first place, and, if they meet, of becoming friends. Hence, it appears likely it should be possible to understand the connectivity of a graph in terms of those attributes, at least to a certain extent. One approach to identify the relations between the connectivity and the attributes is to train a link prediction classifier, with as input the attribute values of a vertex pair, predicting the edge as present or absent (Gong et al. 2014; Yin et al. 2010; Barbieri et al. 2014; Wei et al. 2017). S