Scalable and efficient graph traversal on high-throughput cluster
- PDF / 2,092,604 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 48 Downloads / 215 Views
REGULAR PAPER
Scalable and efficient graph traversal on high‑throughput cluster Dongrui Fan1,2 · Huawei Cao1 · Guobo Wang1,2 · Na Nie1,2 · Xiaochun Ye1,3 · Ninghui Sun1,2 Received: 15 June 2020 / Accepted: 22 October 2020 © China Computer Federation (CCF) 2020
Abstract Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algorithm to solve the graph traversal problem and also the key kernel of Graph500 benchmark. On modern CPU architecture, the implementation of graph traversal on single-node systems has achieved significant improvement. However, due to the low resource utilization and high communications overhead, graph traversal on distributed clusters suffers from poor performance and energy inefficiency. Highthroughput cluster (HTCs) adopt High-Throughput many-core architecture, which has the characteristics of high concurrency, strong real-time, and low-power consumption. In this work, we propose several techniques, including asynchronous virtual ring method, thread caching scheme and vertex ID reordering to solve above problems and improve BFS performance on HTCs. We systematically evaluate optimized BFS algorithm and achieve 249.74 giga-traversed edges per second (GTEPS) on 72 nodes (2880 cores) HTCs. Compared with results on Graph500 list, the optimized algorithm achieves the highest node efficiency under the same cluster scale and the performance shows weakly linear scalability as the number of cluster nodes increases. With regard to efficiency, the average performance on HTCs is 3.47 GTEPS/node, which is the best among CPU-based distributed systems on the November 2019 Graph500 list. Keywords BFS · Parallel computing · Graph500 · Graph traversal
1 Introduction Graph is one of the important data structures in modern big data applications. Many problems in reality can be abstracted and described with graph. Recently, there has been a burst of interest in correlation analysis with graph, such as social networks, traffic networks, protein structures, and knowledge graphs, etc. (Barabasi and Albert 1999; Haveliwala 2003; Mislove et al. 2007; Bader et al. 2008). However, Graph applications that analyze such data often suffer from poor data locality, high communications overhead, low parallel
efficiency and poor scalability, which are the main characteristics of data-intensive applications. The characteristics of data-intensive applications are quite different from those of traditional high-performance computing (HPC) applications. The Linpack benchmark used in Top500 can no longer accurately reflect performance of modern architecture on data-intensive applications. In 2011, the Graph500 list was proposed to evaluate the ability of servers and supercomputers for the data-intensive applications (Ang et al. 2010). BFS has been chosen as the kernel of Graph500. BFS is a fundamental graph traversal algorithm
* Huawei Cao [email protected] 1
State Key Lab
Data Loading...