Robust keyword search in large attributed graphs
- PDF / 2,199,529 Bytes
- 23 Pages / 439.37 x 666.142 pts Page_size
- 61 Downloads / 240 Views
Robust keyword search in large attributed graphs Spencer Bryson1 · Heidar Davoudi1 · Lukasz Golab2 · Mehdi Kargar3 · Yuliya Lytvyn2 · Piotr Mierzejewski4 · Jaroslaw Szlichta1,4 · Morteza Zihayat3,4 Received: 4 November 2019 / Accepted: 13 July 2020 © Springer Nature B.V. 2020
Abstract There is a growing need to explore attributed graphs such as social networks, expert networks, and biological networks. A well-known mechanism for non-technical users to explore such graphs is keyword search, which receives a set of query keywords and returns a connected subgraph that contains the keywords. However, existing approaches, such as methods based on shortest paths between nodes containing the query keywords, may generate weakly-connected answers as they ignore the structure of the whole graph. To address this issue, we formulate and solve the robust keyword search problem for attributed graphs to find strongly-connected answers. We prove that the problem is NP-hard and we propose a solution based on a random walk with restart (RWR). To improve the efficiency and scalability of RWR, we use Monte Carlo approximation and we also propose a distributed version, which we implement in Apache Spark. Finally, we provide experimental evidence of the efficiency and effectiveness of our approach on real-world graphs. Keywords Attributed graphs · Social networks · Keyword search
1 Introduction Increasingly, organizations and communities are focusing on graph analysis to make faster and better decisions leading to business and societal impact. Attributed graphs, i.e., those with attributes or labels attached to their nodes and/or edges, are especially common. Examples include social networks (e.g., Facebook and Twitter, whose nodes may be labelled with user information) and biological networks (e.g., protein-protein interaction networks whose nodes are labeled with protein properties). Moreover, other data models can also be naturally represented as attributed graphs. For example, a relational database instance corresponds to a graph whose nodes are tuples (labelled with the attributes of the
* Mehdi Kargar [email protected] * Morteza Zihayat [email protected] Extended author information available on the last page of the article
13
Vol.:(0123456789)
Information Retrieval Journal
Customer
Customer_Account
C_ID C_Name
Email
CA_ID CA_C_ID CA_Name
10
David Jackson
[email protected]
22
20
Sarah Jackson
[email protected]
10
Saving Account
Trade T_ID
Date
Price
Quantity T_CA_ID
52
2019-05-05
$40.25
100
22
Security
Company
S_SYMB
S_CO_ID S_Name
CO_ID CO_Name
ADTK
84
84
COMMON of ADTK
Customer_Account Saving Account
Adept Inc.
Customer David Jackson
Trade Date: 2019-05-05 Price: $40.25 Quantity: 100 Security COMMON of ADTK
Company Adept Inc.
Customer Sarah Jackson
Fig. 1 A portion of the TPC-E database and its associated attributed graph
tuples) and whose edges denote foreign key relationships. Given the wealth of information contained in attributed graphs, exploring and mining them is of critical importance (Ka
Data Loading...