Robust keyword search in large attributed graphs

  • PDF / 2,199,529 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 61 Downloads / 240 Views

DOWNLOAD

REPORT


Robust keyword search in large attributed graphs Spencer Bryson1 · Heidar Davoudi1 · Lukasz Golab2 · Mehdi Kargar3   · Yuliya Lytvyn2 · Piotr Mierzejewski4 · Jaroslaw Szlichta1,4 · Morteza Zihayat3,4 Received: 4 November 2019 / Accepted: 13 July 2020 © Springer Nature B.V. 2020

Abstract There is a growing need to explore attributed graphs such as social networks, expert networks, and biological networks. A well-known mechanism for non-technical users to explore such graphs is keyword search, which receives a set of query keywords and returns a connected subgraph that contains the keywords. However, existing approaches, such as methods based on shortest paths between nodes containing the query keywords, may generate weakly-connected answers as they ignore the structure of the whole graph. To address this issue, we formulate and solve the robust keyword search problem for attributed graphs to find strongly-connected answers. We prove that the problem is NP-hard and we propose a solution based on a random walk with restart (RWR). To improve the efficiency and scalability of RWR, we use Monte Carlo approximation and we also propose a distributed version, which we implement in Apache Spark. Finally, we provide experimental evidence of the efficiency and effectiveness of our approach on real-world graphs. Keywords  Attributed graphs · Social networks · Keyword search

1 Introduction Increasingly, organizations and communities are focusing on graph analysis to make faster and better decisions leading to business and societal impact. Attributed graphs, i.e., those with attributes or labels attached to their nodes and/or edges, are especially common. Examples include social networks (e.g., Facebook and Twitter, whose nodes may be labelled with user information) and biological networks (e.g., protein-protein interaction networks whose nodes are labeled with protein properties). Moreover, other data models can also be naturally represented as attributed graphs. For example, a relational database instance corresponds to a graph whose nodes are tuples (labelled with the attributes of the

* Mehdi Kargar [email protected] * Morteza Zihayat [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)



Information Retrieval Journal

Customer

Customer_Account

C_ID C_Name

Email

CA_ID CA_C_ID CA_Name

10

David Jackson

[email protected]

22

20

Sarah Jackson

[email protected]

10

Saving Account

Trade T_ID

Date

Price

Quantity T_CA_ID

52

2019-05-05

$40.25

100

22

Security

Company

S_SYMB

S_CO_ID S_Name

CO_ID CO_Name

ADTK

84

84

COMMON of ADTK

Customer_Account Saving Account

Adept Inc.

Customer David Jackson

Trade Date: 2019-05-05 Price: $40.25 Quantity: 100 Security COMMON of ADTK

Company Adept Inc.

Customer Sarah Jackson

Fig. 1  A portion of the TPC-E database and its associated attributed graph

tuples) and whose edges denote foreign key relationships. Given the wealth of information contained in attributed graphs, exploring and mining them is of critical importance (Ka