Parallel Matrix Transposition and Vector Multiplication Using OpenMP

In this chapter, we propose two parallel algorithms for sparse matrix transposition and vector multiplication using CSR format: with and without actual matrix transposition. Both algorithms are parallelized using OpenMP. Experimentations are run on a quad

  • PDF / 425,912 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 103 Downloads / 832 Views

DOWNLOAD

REPORT


Abstract In this chapter, we propose two parallel algorithms for sparse matrix transposition and vector multiplication using CSR format: with and without actual matrix transposition. Both algorithms are parallelized using OpenMP. Experimentations are run on a quad-core Intel Xeon64 CPU E5507. We measure and compare the performance of our algorithms with that of using CSB scheme. Our experimental results show that actual matrix transposition algorithm is comparable to the CSB-based algorithm; on the other hand, direct sparse matrix-transposevector multiplication using CSR significantly outperforms CSB-based algorithm. Keywords OpenMP • Sparse matrix transposition

1 Introduction The sparse matrix transposition and vector multiplication are important among many scientific computing applications. CSR (compressed sparse row) and CSC (compressed sparse column) are the simplest formats. Many parallel matrix-vector multiplications using CSR have been proposed [1, 2], but the matrix transposition is also important in many scientific computations such as iterative linear solvers. An efficient matrix transposition and vector multiplication algorithm has been actively researched, and most of the recent works have been using CSB (compressed sparse block) format [3]. Mateescu et al. [4] develop a dense matrix transpose rather than sparse that runs on POWER7 machine by utilizing its cache model and prefetching technique. Gustavson [5] proposed two fast serial algorithms for sparse matrix multiplication and permuted transposition using CSR; they were serial codes.

T.-H. Weng (*) • D. Batjargal • H. Pham • M.-Y. Hsieh • K.-C. Li Department of Computer Science and Information Engineering, Providence University, Taichung 43301, Taiwan e-mail: [email protected] J. Juang and Y.-C. Huang (eds.), Intelligent Technologies and Engineering Systems, Lecture Notes in Electrical Engineering 234, DOI 10.1007/978-1-4614-6747-2_30, # Springer Science+Business Media New York 2013

243

244

T.-H. Weng et al.

In this chapter, we proposed two algorithms for sparse matrix transposition and vector multiplication using CSR: with and without actual transposition. We use OpenMP [6] because it requires less effort and it is portable across shared memory environment. Other advantages include incremental parallelization and easier to program, debug, understand, and maintain. We compare ours to the previous research work on parallel sparse matrix-vector multiplication which used the CSB by Buluc¸ et al. [3]. This CSB scheme requires no actual transposition, and is implemented using Cilk++ [7] and written in C++ in recursive style, and can be downloaded from [8].

2 Sparse Storage Format In this section, we introduce CSR and CSB sparse matrix formats. CSR consists of three arrays, namely, val, idx, and ptr, as shown in Fig. 1a. Let A be a sparse matrix whose order is r by c and NNZ is the number of nonzero of A. The value of nonzero elements is stored in a double-precision array val of length NNZ. The column indices for each nonzero element are stored in t