Fast Inbound Top-K Query for Random Walk with Restart

Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR me

  • PDF / 551,395 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 41 Downloads / 174 Views

DOWNLOAD

REPORT


Abstract. Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k, the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q. Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top-k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q, the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top-k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than stateof-the-art method.

1

Introduction

Graphs have long been considered as one of the most important structures that can naturally model numerous real-life data objects (e.g., the Web, social network, protein-protein interaction network). In most graph-related applications, it is fundamental to quantify node-to-node structural proximity. Among existing structural proximity measures, random walk with restart (RWR) is recognized as one of the most important, and has been widely adopted in Web search [15], item recommendation [12], link prediction [13], graph clustering [2], and many other tasks. Compared with other proximity measures like shortest path, RWR enjoys the nice property of capturing the holistic graph structure and being robust to noise in the graph. c Springer International Publishing Switzerland 2015  A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 608–624, 2015. DOI: 10.1007/978-3-319-23525-7 37

Fast Inbound Top-K Query for Random Walk with Restart

609

To date, much research effort has been devoted to RWR, including its efficient computation ([6], [16], [5], [7], [18], [8], [14]), top-k search ([7], [10], [3], [17]), and various mining tasks underpinned by RWR ([13], [2], [12]). However, insufficient attention has been paid to a fundamental task that arises in many graph-related applications, which is to determine the source nodes that have a large amount of information flowing to a given query node. To illustrate, consider a traffic flow network shown in Figure 1. Assume