A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector
- PDF / 2,828,619 Bytes
- 14 Pages / 595.276 x 790.866 pts Page_size
- 36 Downloads / 166 Views
(0123456789().,-volV)(0123456789().,-volV)
A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector Song Jin1 • Songwei Pei2 • Yu Wang1 • Yincheng Qi1 Received: 24 September 2019 / Revised: 15 September 2020 / Accepted: 26 September 2020 Ó Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Sparse triangular solve (SpTRSV) is an important kernel in many scientific computing applications. In traditional viewpoints, accelerating SpTRSV by parallelizing the solution process is a challenging task. Dependencies among the variables that exist in the solution process not only restrict the parallelism that can be achieved, but also introduce large synchronization overhead among the parallel tasks. Moreover, a time-consuming pre-processing phase is commonly required to identify calculations that can be parallelized. However, we have observed that a large number of dependencies among the variables can be eliminated if we only calculate partial values of the variables first and add them together to obtain the final values later. By using such a strategy, starting to solve a variable does not need to wait for all of its prerequisite variables having been solved. In consequence, parallelism of the SpTRSV can be increased significantly. In this paper, we transform above mentioned observations into a subtree-based parallel algorithm to accelerate SpTRSV. The proposed algorithm calculates partial values of the variable along with an implicit subtree traversal and utilizes hardware atomic operation to implement accumulation of the partial values. This not only introduces no pre-processing overhead, but also avoids any barrier synchronization among the parallel threads. We evaluate the proposed algorithm on 2135 matrices from SuiteSparse Matrix Collection based on a generic GPU platform. Experimental results demonstrate that our scheme outperforms the state-of-the-art GPU and CPU vendor libraries in 1949 and 1782 matrices, respectively. Compared with the latest synchronization-free method, our scheme outperforms in 1779 matrices. Keywords Sparse triangular solve Parallel algorithm Dependency elimination
1 Introduction Sparse triangular solve (SpTRSV) performs computation of the solution vector x for linear system Lx ¼ b, where L is a sparse lower triangular matrix and b is the right-hand side vector. SpTRSV is an indispensable kernel in many linear algebra routines, such as direct methods [1] and pre-conditioned iterative methods [2]. It is common for these routines to spend most of their execution time on SpTRSV.
& Song Jin [email protected] Songwei Pei [email protected] 1
North China Electric Power University, Baoding, China
2
Beijing University of Posts and Telecommunications, Beijing, China
For example, in the iterative conjugate gradient algorithm using an incomplete Cholesky or Gauss–Seidel pre-conditioner, up to 70% of the execution time accounts for SpTRSV [3]. Therefore, accelerating SpTRSV has great significance in reducing the whole exe
Data Loading...