Scalable and efficient graph traversal on high-throughput cluster

PDF / 2,092,604 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
48 Downloads / 239 Views

REGULAR PAPER

Scalable and efficient graph traversal on high‑throughput cluster Dongrui Fan1,2 · Huawei Cao1 · Guobo Wang1,2 · Na Nie1,2 · Xiaochun Ye1,3 · Ninghui Sun1,2 Received: 15 June 2020 / Accepted: 22 October 2020 © China Computer Federation (CCF) 2020

Abstract Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algorithm to solve the graph traversal problem and also the key kernel of Graph500 benchmark. On modern CPU architecture, the implementation of graph traversal on single-node systems has achieved significant improvement. However, due to the low resource utilization and high communications overhead, graph traversal on distributed clusters suffers from poor performance and energy inefficiency. Highthroughput cluster (HTCs) adopt High-Throughput many-core architecture, which has the characteristics of high concurrency, strong real-time, and low-power consumption. In this work, we propose several techniques, including asynchronous virtual ring method, thread caching scheme and vertex ID reordering to solve above problems and improve BFS performance on HTCs. We systematically evaluate optimized BFS algorithm and achieve 249.74 giga-traversed edges per second (GTEPS) on 72 nodes (2880 cores) HTCs. Compared with results on Graph500 list, the optimized algorithm achieves the highest node efficiency under the same cluster scale and the performance shows weakly linear scalability as the number of cluster nodes increases. With regard to efficiency, the average performance on HTCs is 3.47 GTEPS/node, which is the best among CPU-based distributed systems on the November 2019 Graph500 list. Keywords BFS · Parallel computing · Graph500 · Graph traversal

1 Introduction Graph is one of the important data structures in modern big data applications. Many problems in reality can be abstracted and described with graph. Recently, there has been a burst of interest in correlation analysis with graph, such as social networks, traffic networks, protein structures, and knowledge graphs, etc. (Barabasi and Albert 1999; Haveliwala 2003; Mislove et al. 2007; Bader et al. 2008). However, Graph applications that analyze such data often suffer from poor data locality, high communications overhead, low parallel

efficiency and poor scalability, which are the main characteristics of data-intensive applications. The characteristics of data-intensive applications are quite different from those of traditional high-performance computing (HPC) applications. The Linpack benchmark used in Top500 can no longer accurately reflect performance of modern architecture on data-intensive applications. In 2011, the Graph500 list was proposed to evaluate the ability of servers and supercomputers for the data-intensive applications (Ang et al. 2010). BFS has been chosen as the kernel of Graph500. BFS is a fundamental graph traversal algorithm

* Huawei Cao [email protected] 1

State Key Lab

Data Loading...

Scalable and efficient graph traversal on high-throughput cluster

Recommend Documents

Graph Traversal

Compressed graph representation for scalable molecular graph generation

Scalable attack on graph data by injecting vicious nodes

Dynamic, Scalable, and Efficient Content Replication Techniques

Scalable In-Memory Graph Pattern Matching on Symmetric Multiprocessor Systems

Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

SMART: An Efficient, Scalable, and Robust Streaming Video System

Engineering Scalable, Elastic, and Cost-Efficient Cloud Computing Applications

Efficient random graph matching via degree profiles

GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs

Computationally Efficient MCTF for MC-EZBC Scalable Video Coding Framework

An Efficient Algorithm for Finding Frequent Sequential Traversal Patterns from Web Logs Based on Dynamic Weight Constrai