Dynamic Sparse-Matrix Allocation on GPUs

Sparse matrices are a core component in many numerical simulations, and their efficiency is essential to achieving high performance. Dynamic sparse-matrix allocation (insertion) can benefit a number of problems such as sparse-matrix factorization, sparse-

PDF / 877,151 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
36 Downloads / 246 Views

DOWNLOAD

REPORT

Abstract. Sparse matrices are a core component in many numerical simulations, and their eﬃciency is essential to achieving high performance. Dynamic sparse-matrix allocation (insertion) can beneﬁt a number of problems such as sparse-matrix factorization, sparse-matrixmatrix addition, static analysis (e.g., points-to analysis), computing transitive closure, and other graph algorithms. Existing sparse-matrix formats are poorly designed to handle dynamic updates. The compressed sparse-row (CSR) format is fully compact and must be rebuilt after each new entry. Ellpack (ELL) stores a constant number of entries per row, which allows for eﬃcient insertion and sparse matrix-vector multiplication (SpMV) but is memory ineﬃcient and strictly limits row size. The coordinate (COO) format stores a list of entries and is eﬃcient for both memory use and insertion time; however, it is much less eﬃcient at SpMV. Hybrid ellpack (HYB) compromises by using a combination of ELL and COO but degrades in performance as the COO portion ﬁlls up. Rows that use the COO portion require it to be completely traversed during every SpMV operation. In this paper we introduce a new sparse matrix format, dynamic compressed sparse row (DCSR), that permits eﬃcient dynamic updates. These updates are signiﬁcantly faster than those made to a HYB matrix while maintaining SpMV times comparable to CSR. We demonstrate the eﬃcacy of our dynamic allocation scheme, evaluating updates and SpMV operations on adjacency matrices of sparse-graph benchmarks on the GPU.

1

Introduction

Sparse matrix-vector multiply (SpMV) is the workhorse operation of many numerical simulations and has seen use in a wide variety of areas such as data mining [1] and graph analytics [2]. In these algorithms, a majority of the total processing is often spent on SpMV operations. Iterative computations such as the power method and conjugate gradient are commonly used in numerical simulations and require successive SpMV operations [3]. The use of GPUs has become increasingly common in computing these operations as they are, in principle, highly parallelizable. GPUs have both a high computational throughput and a high memory bandwidth. Operations on sparse matrices are generally memory bound; this makes the GPU a good target platform due to its higher memory bandwidth compared to that of the CPU, but it is still diﬃcult to attain high performance with sparse matrices because of thread divergence and noncoalesced memory accesses. c Springer International Publishing Switzerland 2016 J.M. Kunkel et al. (Eds.): ISC High Performance 2016, LNCS 9697, pp. 61–80, 2016. DOI: 10.1007/978-3-319-41321-1 4

62

J. King et al.

Some applications require dynamic updates to the matrix; generally construed, updates may include inserting or deleting entries. Fully compressed formats such as compressed sparse row (CSR) cannot handle these operations without rebuilding the entire matrix. Rebuilding the matrix is orders of magnitude more costly than performing an SpMV operation. The ellpack (ELL) format alloc

Data Loading...

Dynamic Sparse-Matrix Allocation on GPUs

Recommend Documents

Revisiting ECM on GPUs

Dynamic Asset Allocation with Forwards and Futures

Dynamic intelligent resource allocation for emergency situations

Cognitive Radio Networks Dynamic Resource Allocation Schemes

Efficient Non-fused Winograd on GPUs

GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs

A Study of Overflow Vulnerabilities on GPUs

Memory-Optimized Wavefront Parallelism on GPUs

Programming for GPUs

Dynamic Pricing and Automated Resource Allocation for Complex Information Services

Dynamic Edge User Allocation with User Specified QoS Preferences

Communication Optimization for Efficient Dynamic Task Allocation in Swarm Robotics