Graph pattern matching with counting quantifiers and label-repetition constraints

  • PDF / 2,976,495 Bytes
  • 25 Pages / 595.276 x 790.866 pts Page_size
  • 100 Downloads / 207 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

Graph pattern matching with counting quantifiers and label-repetition constraints Houari Mahfoud1 Received: 2 May 2019 / Revised: 2 May 2019 / Accepted: 16 August 2019  Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract In recent years, we have witnessed an increasing use of graph pattern matching in a wide variety of applications such as social networks analysis, knowledge discovery, software plagiarism detection and many more. It is typically defined in terms of subgraph isomorphism, an NP-COMPLETE problem. To overcome this cost, many extensions of graph simulation have been proposed that allow graph pattern matching to be conducted in cubic-time. However, in emerging applications, more expressive patterns are needed, notably ones with counting quantifiers (CQs) which are not considered by simulationbased approaches. In this article, we propose a simulation-based graph pattern matching approach that supports CQs on edges of graph patterns. We first consider CQs that express numeric aggregates only. We show that our approach is in PTIME as earlier extensions of graph simulation by providing a cubic-time quantified matching algorithm, i.e., an algorithm for matching graph patterns that contain CQs. In the second part, we discuss the problem of Label-Repetition Constraints (LRCs). We define a necessary and sufficient condition for the satisfaction of LRCs. Based on this condition, we give an extension of our quantified matching algorithm to deal with LRCs in PTIME, together with an optimization technique. Finally, we show that our quantified graph pattern matching approach retains the same complexity bounds when dealing with ratio aggregates. To our knowledge, this is the first effort to deal with numeric aggregates, ratio aggregates, and LRCs on graph patterns in PTIME. Keywords Graph simulation  Quantified graph pattern matching  Label-repetition constraints

1 Introduction Graph pattern matching is a routine process for a wide variety of applications such as social network analysis [10], software plagiarism detection [21], fraud detection [4], intelligent analysis [6], team formation [24] and many others. Given a data graph G and a graph pattern P, it is to find all subgraphs of G that match P. Matching here is typically expressed in terms of subgraph isomorphism which consists to find all subgraphs of G that are isomorphic to P. However, subgraph isomorphism is an NPCOMPLETE problem [8]. The number of matches via subgraph isomorphism may be exponential in the size of the data graph, and in addition, they are all required to have the

& Houari Mahfoud [email protected] 1

same topology as the graph pattern which is too restrictive to catch sensible matches in emerging applications as observed in [11, 27]. To reduce the cost and loosen the restriction, graph simulation [28] and its extensions [11, 12, 22, 23] have been proposed that allow graph pattern matching to be conducted in polynomial time by focusing on some topological