Online summarization of dynamic graphs using subjective interestingness for sequential data

  • PDF / 3,558,087 Bytes
  • 39 Pages / 439.37 x 666.142 pts Page_size
  • 121 Downloads / 181 Views

DOWNLOAD

REPORT


Online summarization of dynamic graphs using subjective interestingness for sequential data Sarang Kapoor1

· Dhish Kumar Saxena2

· Matthijs van Leeuwen3

Received: 10 January 2020 / Accepted: 19 August 2020 © The Author(s) 2020

Abstract Many real-world phenomena can be represented as dynamic graphs, i.e., networks that change over time. The problem of dynamic graph summarization, i.e., to succinctly describe the evolution of a dynamic graph, has been widely studied. Existing methods typically use objective measures to find fixed structures such as cliques, stars, and cores. Most of the methods, however, do not consider the problem of online summarization, where the summary is incrementally conveyed to the analyst as the graph evolves, and (thus) do not take into account the knowledge of the analyst at a specific moment in time. We address this gap in the literature through a novel, generic framework for subjective interestingness for sequential data. Specifically, we iteratively identify atomic changes, called ‘actions’, that provide most information relative to the current knowledge of the analyst. For this, we introduce a novel information gain measure, which is motivated by the minimum description length (MDL) principle. With this measure, our approach discovers compact summaries without having to decide on the number of patterns. As such, we are the first to combine approaches for data mining based on subjective interestingness (using the maximum entropy principle) with pattern-based summarization (using the MDL principle). We instantiate this framework for dynamic graphs and dense subgraph patterns, and present DSSG, a heuristic algorithm for the online summarization of dynamic graphs by means of informative actions, each of which represents an interpretable change to the connectivity structure of the graph. The experiments on real-world data demonstrate that our approach effectively discovers informative summaries. We conclude with a case study on data from an airline network to show its potential for real-world applications. Keywords Graph summarization · Maximum entropy principle · Subjective interestingness · Dynamic graphs

Responsible editor: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier

B

Matthijs van Leeuwen [email protected]

Extended author information available on the last page of the article

123

S. Kapoor et al.

1 Introduction Many real-world phenomena, including interactions between people (e.g., social media, e-mail), web browsing, transport and logistics operations, and asset management, can be modelled in terms of the relationships between entities. That is, the corresponding data can be naturally represented as a network or graph, where vertices represent the entities and edges represent their relationships. When these relationships change over time, the graphs are called dynamic graphs. The problem of static graph summarization has been widely studied, e.g., to efficiently store large volumes of data (Navlakha et al. 2008); improve query efficien