Discovering subjectively interesting multigraph patterns

  • PDF / 4,979,111 Bytes
  • 28 Pages / 439.37 x 666.142 pts Page_size
  • 50 Downloads / 215 Views

DOWNLOAD

REPORT


Discovering subjectively interesting multigraph patterns Sarang Kapoor1   · Dhish Kumar Saxena2   · Matthijs van Leeuwen3  Received: 18 May 2019 / Revised: 14 October 2019 / Accepted: 13 February 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Over the past decade, network analysis has attracted substantial interest because of its potential to solve many real-world problems. This paper lays the conceptual foundation for an application in aviation, through focusing on the discovery of patterns in multigraphs (graphs in which multiple edges can be present between vertices). Our main contributions are twofold. Firstly, we propose a novel subjective interestingness measure for patterns in both undirected and directed multigraphs. Though this proposal is inspired by our previous related research for simple graphs (having only single edges), the properties of multigraphs make this transition challenging. Secondly, we propose a greedy algorithm for subjectively interesting pattern mining, and demonstrate its efficacy through several experiments on synthetic and real-world examples. We conclude with a case study in aviation, which demonstrates how the departure from an analyst’s prior beliefs captured as subjectively interesting patterns could help improve an analyst’s understanding of the data and problem at hand. Keywords  Multigraph · Subjective interestingness · Maximum entropy principle · Exploratory data mining

1 Introduction Over the past decade, researchers have realised that network analysis can be used to address many real-world problems. Examples include problems related to computer network infrastructure, co-authorship (scientific or other), co-actors (e.g., in movies), transport (road, Editors: Kee-Eung Kim and Jun Zhu. * Sarang Kapoor [email protected] Dhish Kumar Saxena [email protected] Matthijs van Leeuwen [email protected] 1

Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India

2

Department of Mechanical and Industrial Engineering, Indian Institute of Technology, Roorkee, India

3

Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands



13

Vol.:(0123456789)



Machine Learning

Fig. 1  An airline transportation network modelled as directed multigraph

airline, ...), and even tax evasion (Newman 2010). This has led to research on several types of networks, typically modelled as simple graphs (graphs having at most one edge between any pair of vertices) and weighted graphs (simple graphs but with weights on edges). A type of network that, to the best of our knowledge, has not yet been widely considered in the data mining literature1 is one that needs to be modelled as a multigraph (graph in which multiple edges can be present between any pair of vertices). Motivated by an application in aviation, this paper lays the conceptual foundations for the discovery of subjectively interesting multigraphs patterns (SIMPs). SIMPs