Parallel Matrix Transposition and Vector Multiplication Using OpenMP

In this chapter, we propose two parallel algorithms for sparse matrix transposition and vector multiplication using CSR format: with and without actual matrix transposition. Both algorithms are parallelized using OpenMP. Experimentations are run on a quad

PDF / 425,912 Bytes
7 Pages / 439.37 x 666.142 pts Page_size
103 Downloads / 832 Views

DOWNLOAD

REPORT

Abstract In this chapter, we propose two parallel algorithms for sparse matrix transposition and vector multiplication using CSR format: with and without actual matrix transposition. Both algorithms are parallelized using OpenMP. Experimentations are run on a quad-core Intel Xeon64 CPU E5507. We measure and compare the performance of our algorithms with that of using CSB scheme. Our experimental results show that actual matrix transposition algorithm is comparable to the CSB-based algorithm; on the other hand, direct sparse matrix-transposevector multiplication using CSR significantly outperforms CSB-based algorithm. Keywords OpenMP • Sparse matrix transposition

1 Introduction The sparse matrix transposition and vector multiplication are important among many scientific computing applications. CSR (compressed sparse row) and CSC (compressed sparse column) are the simplest formats. Many parallel matrix-vector multiplications using CSR have been proposed [1, 2], but the matrix transposition is also important in many scientific computations such as iterative linear solvers. An efficient matrix transposition and vector multiplication algorithm has been actively researched, and most of the recent works have been using CSB (compressed sparse block) format [3]. Mateescu et al. [4] develop a dense matrix transpose rather than sparse that runs on POWER7 machine by utilizing its cache model and prefetching technique. Gustavson [5] proposed two fast serial algorithms for sparse matrix multiplication and permuted transposition using CSR; they were serial codes.

T.-H. Weng (*) • D. Batjargal • H. Pham • M.-Y. Hsieh • K.-C. Li Department of Computer Science and Information Engineering, Providence University, Taichung 43301, Taiwan e-mail: [email protected] J. Juang and Y.-C. Huang (eds.), Intelligent Technologies and Engineering Systems, Lecture Notes in Electrical Engineering 234, DOI 10.1007/978-1-4614-6747-2_30, # Springer Science+Business Media New York 2013

243

244

T.-H. Weng et al.

In this chapter, we proposed two algorithms for sparse matrix transposition and vector multiplication using CSR: with and without actual transposition. We use OpenMP [6] because it requires less effort and it is portable across shared memory environment. Other advantages include incremental parallelization and easier to program, debug, understand, and maintain. We compare ours to the previous research work on parallel sparse matrix-vector multiplication which used the CSB by Buluc¸ et al. [3]. This CSB scheme requires no actual transposition, and is implemented using Cilk++ [7] and written in C++ in recursive style, and can be downloaded from [8].

2 Sparse Storage Format In this section, we introduce CSR and CSB sparse matrix formats. CSR consists of three arrays, namely, val, idx, and ptr, as shown in Fig. 1a. Let A be a sparse matrix whose order is r by c and NNZ is the number of nonzero of A. The value of nonzero elements is stored in a double-precision array val of length NNZ. The column indices for each nonzero element are stored in t

Data Loading...

Parallel Matrix Transposition and Vector Multiplication Using OpenMP

Recommend Documents

Using Transposition to Efficiently Solve Constant Matrix-Vector Multiplication and Sum of Product Problems

HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs

Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Fast hybrid matrix multiplication algorithms

Algorithms for In-Place Matrix Transposition

Inexact Sparse Matrix Vector Multiplication in Krylov Subspace Methods: An Application-Oriented Reduction Method

All Pairs Shortest Paths via Matrix Multiplication

New fast hybrid matrix multiplication algorithms

A fast cellular method of matrix multiplication

A New Fast Recursive Matrix Multiplication Algorithm

OpenMP Shared Memory Parallel Programming International Workshop on