Robust keyword search in large attributed graphs
- PDF / 2,199,529 Bytes
- 23 Pages / 439.37 x 666.142 pts Page_size
- 61 Downloads / 271 Views
		    Robust keyword search in large attributed graphs Spencer Bryson1 · Heidar Davoudi1 · Lukasz Golab2 · Mehdi Kargar3   · Yuliya Lytvyn2 · Piotr Mierzejewski4 · Jaroslaw Szlichta1,4 · Morteza Zihayat3,4 Received: 4 November 2019 / Accepted: 13 July 2020 © Springer Nature B.V. 2020
 
 Abstract There is a growing need to explore attributed graphs such as social networks, expert networks, and biological networks. A well-known mechanism for non-technical users to explore such graphs is keyword search, which receives a set of query keywords and returns a connected subgraph that contains the keywords. However, existing approaches, such as methods based on shortest paths between nodes containing the query keywords, may generate weakly-connected answers as they ignore the structure of the whole graph. To address this issue, we formulate and solve the robust keyword search problem for attributed graphs to find strongly-connected answers. We prove that the problem is NP-hard and we propose a solution based on a random walk with restart (RWR). To improve the efficiency and scalability of RWR, we use Monte Carlo approximation and we also propose a distributed version, which we implement in Apache Spark. Finally, we provide experimental evidence of the efficiency and effectiveness of our approach on real-world graphs. Keywords  Attributed graphs · Social networks · Keyword search
 
 1 Introduction Increasingly, organizations and communities are focusing on graph analysis to make faster and better decisions leading to business and societal impact. Attributed graphs, i.e., those with attributes or labels attached to their nodes and/or edges, are especially common. Examples include social networks (e.g., Facebook and Twitter, whose nodes may be labelled with user information) and biological networks (e.g., protein-protein interaction networks whose nodes are labeled with protein properties). Moreover, other data models can also be naturally represented as attributed graphs. For example, a relational database instance corresponds to a graph whose nodes are tuples (labelled with the attributes of the
 
 * Mehdi Kargar [email protected] * Morteza Zihayat [email protected] Extended author information available on the last page of the article
 
 13
 
 Vol.:(0123456789)
 
 
 
 Information Retrieval Journal
 
 Customer
 
 Customer_Account
 
 C_ID C_Name
 
 Email
 
 CA_ID CA_C_ID CA_Name
 
 10
 
 David Jackson
 
 [email protected]
 
 22
 
 20
 
 Sarah Jackson
 
 [email protected]
 
 10
 
 Saving Account
 
 Trade T_ID
 
 Date
 
 Price
 
 Quantity T_CA_ID
 
 52
 
 2019-05-05
 
 $40.25
 
 100
 
 22
 
 Security
 
 Company
 
 S_SYMB
 
 S_CO_ID S_Name
 
 CO_ID CO_Name
 
 ADTK
 
 84
 
 84
 
 COMMON of ADTK
 
 Customer_Account Saving Account
 
 Adept Inc.
 
 Customer David Jackson
 
 Trade Date: 2019-05-05 Price: $40.25 Quantity: 100 Security COMMON of ADTK
 
 Company Adept Inc.
 
 Customer Sarah Jackson
 
 Fig. 1  A portion of the TPC-E database and its associated attributed graph
 
 tuples) and whose edges denote foreign key relationships. Given the wealth of information contained in attributed graphs, exploring and mining them is of critical importance (Ka		
Data Loading...
 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	