Towards distributed node similarity search on graphs
- PDF / 2,418,353 Bytes
- 29 Pages / 439.642 x 666.49 pts Page_size
- 88 Downloads / 197 Views
Towards distributed node similarity search on graphs Tianming Zhang1 · Yunjun Gao1 · Baihua Zheng2 · Lu Chen3 · Shiting Wen4 · Wei Guo1 Received: 1 October 2018 / Revised: 15 January 2020 / Accepted: 22 April 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Node similarity search on graphs has wide applications in recommendation, link prediction, to name just a few. However, existing studies are insufficient due to two reasons: (i) the scale of the real-world graph is growing rapidly, and (ii) vertices are always associated with complex attributes. In this paper, we propose an efficiently distributed framework to support node similarity search on massive graphs, which considers both graph structure correlation and node attribute similarity in metric spaces. The framework consists of preprocessing stage and query stage. In the preprocessing stage, a parallel KD-tree construction (KDC) algorithm is developed to form a newly defined graph so-called hybrid graph, in order to integrate node attribute similarity into the original graph. To equally divide graph vertices into subsets, KDC adopts the KD-tree partitioning after the pivot mapping. In addition, two metric pruning rules and an optimized allocation strategy are presented to reduce communication and computation costs. In the query stage, based on the formed hybrid graph, we develop similarity search methods using random walk with restart (RWR) to measure node similarity. To boost efficiency, we derive tight bounds to rapidly shrink the search region. Extensive experiments with three real massive graphs are conducted to verify the effectiveness, efficiency, and scalability of our proposed techniques. Keywords Graph · Node similarity search · Distributed processing · Algorithm
1 Introduction Graph has been applied in diverse domains such as social network, bioinformatics and chemical datasets. Node similarity search on graphs has a wide application in recommendation [31], link prediction [11], etc. For instance, in a social network, node similarity search is able to be used to recommend like-minded friends for users. In an academic collaboration network, node similarity queries can be employed to find researchers’ preferred papers. Nowadays, the scale of the real-world graph is growing rapidly, and nodes of graphs are always associated with complex attributes such as image visual features, user profile, and paper keywords. Different types of attributes need various distance metrics [8] (e.g., Yunjun Gao
[email protected]
Extended author information available on the last page of the article.
World Wide Web
Minkowski distance, edit distance, Jaccard distance, etc.) to measure similarity. A large number of existing efforts [15, 22, 24, 25, 34] only rely on graph structure to calculate node similarity. However, to obtain high-quality answers, it is crucial to integrate node attribute similarity into node similarity. Example 1 Figure 1a depicts a directed citation network, where a node represents a paper associated with keywor
Data Loading...