Using Pentangular Factorizations for the Reduction to Banded Form
Most methods for computing the singular value decomposition (SVD) first bidiagonalize the matrix. The ScaLAPACK implementation of the blocked reduction of a general dense matrix to bidiagonal form performs about one half of the operations with BLAS3. If w
- PDF / 204,695 Bytes
- 8 Pages / 431 x 666 pts Page_size
- 1 Downloads / 185 Views
2
Department of Mathematics, University of Wuppertal, D-42097 Wuppertal, Germany Computing Center, Aachen University of Technology, D-52074 Aachen, Germany
Abstract. Most methods for computing the singular value decomposition (SVD) first bidiagonalize the matrix. The ScaLAPACK implementation of the blocked reduction of a general dense matrix to bidiagonal form performs about one half of the operations with BLAS3. If we subdivide the task into two stages dense → banded and banded → bidiagonal , we can increase the portion of matrix-matrix operations and expect higher performance. We give an overview of different techniques for the first stage. This note summarizes the results of [9, 10]. Keywords: Linear algebra; Singular value decomposition; Bidiagonal reduction; Parallel BLAS
1
Introduction
Many algorithms for computing the singular value decomposition (SVD) of a general matrix A ∈ IRm×n start with the reduction to bidiagonal form. That is, A → B = U T AV , where B is upper or lower bidiagonal, and U ∈ IRm×m and V ∈ IRn×n are orthogonal. We assume m ≥ n and consider only reduction to upper bidiagonal form. We can obtain B by alternately pre- and postmultiplying A with Householder transformations in order to introduce zeros in the columns and rows of the matrix. In [6], a block formulation of the bidiagonalization algorithm is given, which allows one half of the operations to be performed in matrix-matrix products (BLAS3 [7]) as well as a straightforward parallelization. We will call this reduction technique the direct method, because it reduces a dense matrix directly to bidiagonal form. From Table 1 we see that subdividing the reduction to bidiagonal form into two stages dense → banded (cf. [10]) and banded → bidiagonal (cf. [11]) allows the design of two-stage bidiagonalization algorithms that do the vast majority of the calculations within matrix-matrix products and therefore can make full use of an optimized BLAS3 implementation. Due to better communication management this gain even increases if the algorithms are run on distributed memory parallel machines. The two-stage algorithms offer an attractive alternative, if only the singular values are required. If the orthogonal transformations must be accumulated explicitly (e.g., to compute all the singular vectors), then the direct method is superior. P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 1088–1095, 1999. c Springer-Verlag Berlin Heidelberg 1999
Using Pentangular Factorizations for the Reduction to Banded Form
direct overall flop BLAS3 portion
reduction of A
update U
update V
4mn2 − 43 n3 2mn2 − 23 n3
2mn(2m − n) 2mn(2m − n)
2n3 2n3
1089
two-stage first stage overall flop 4mn2 − 43 n3 + O(mnb) 2mn(2m − n) 2(n − b)3 2 4 3 BLAS3 portion 4mn − 3 n + O(mnb) 2mn(2m − n) 2(n − b)3 second stage overall flop 8n2 b 2mn2 + O(n2 b) 2n3 + O(n2 b) BLAS3 portion 0 2mn2 2n3
Table 1. Approximate flop counts for the bidiagonalization methods.
In this note we briefly describe three implementations of the first stage, i.e., the reduction to banded
Data Loading...