Using Pentangular Factorizations for the Reduction to Banded Form

Most methods for computing the singular value decomposition (SVD) first bidiagonalize the matrix. The ScaLAPACK implementation of the blocked reduction of a general dense matrix to bidiagonal form performs about one half of the operations with BLAS3. If w

PDF / 204,695 Bytes
8 Pages / 431 x 666 pts Page_size
1 Downloads / 185 Views

DOWNLOAD

REPORT

2

Department of Mathematics, University of Wuppertal, D-42097 Wuppertal, Germany Computing Center, Aachen University of Technology, D-52074 Aachen, Germany

Abstract. Most methods for computing the singular value decomposition (SVD) first bidiagonalize the matrix. The ScaLAPACK implementation of the blocked reduction of a general dense matrix to bidiagonal form performs about one half of the operations with BLAS3. If we subdivide the task into two stages dense → banded and banded → bidiagonal , we can increase the portion of matrix-matrix operations and expect higher performance. We give an overview of different techniques for the first stage. This note summarizes the results of [9, 10]. Keywords: Linear algebra; Singular value decomposition; Bidiagonal reduction; Parallel BLAS

1

Introduction

Many algorithms for computing the singular value decomposition (SVD) of a general matrix A ∈ IRm×n start with the reduction to bidiagonal form. That is, A → B = U T AV , where B is upper or lower bidiagonal, and U ∈ IRm×m and V ∈ IRn×n are orthogonal. We assume m ≥ n and consider only reduction to upper bidiagonal form. We can obtain B by alternately pre- and postmultiplying A with Householder transformations in order to introduce zeros in the columns and rows of the matrix. In [6], a block formulation of the bidiagonalization algorithm is given, which allows one half of the operations to be performed in matrix-matrix products (BLAS3 [7]) as well as a straightforward parallelization. We will call this reduction technique the direct method, because it reduces a dense matrix directly to bidiagonal form. From Table 1 we see that subdividing the reduction to bidiagonal form into two stages dense → banded (cf. [10]) and banded → bidiagonal (cf. [11]) allows the design of two-stage bidiagonalization algorithms that do the vast majority of the calculations within matrix-matrix products and therefore can make full use of an optimized BLAS3 implementation. Due to better communication management this gain even increases if the algorithms are run on distributed memory parallel machines. The two-stage algorithms offer an attractive alternative, if only the singular values are required. If the orthogonal transformations must be accumulated explicitly (e.g., to compute all the singular vectors), then the direct method is superior. P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 1088–1095, 1999. c Springer-Verlag Berlin Heidelberg 1999

Using Pentangular Factorizations for the Reduction to Banded Form

direct overall flop BLAS3 portion

reduction of A

update U

update V

4mn2 − 43 n3 2mn2 − 23 n3

2mn(2m − n) 2mn(2m − n)

2n3 2n3

1089

two-stage first stage overall flop 4mn2 − 43 n3 + O(mnb) 2mn(2m − n) 2(n − b)3 2 4 3 BLAS3 portion 4mn − 3 n + O(mnb) 2mn(2m − n) 2(n − b)3 second stage overall flop 8n2 b 2mn2 + O(n2 b) 2n3 + O(n2 b) BLAS3 portion 0 2mn2 2n3

Table 1. Approximate flop counts for the bidiagonalization methods.

In this note we briefly describe three implementations of the first stage, i.e., the reduction to banded

Data Loading...

Using Pentangular Factorizations for the Reduction to Banded Form

Recommend Documents

Mineralogical Characterization-Assisted Reduction Roasting of Banded Hematite Jasper Ore

Banded solidification microstructures

Signal watermarking in bi-dimensional representations using matrix factorizations

Factorizations and Divisibility

Gaussian Elimination and LU Factorizations

Automatic Prediction of Word Form Reduction in Russian Spontaneous Speech

Evaluation of a Permutation-Based Evolutionary Framework for Lyndon Factorizations

Reduction of Fifth-Order Ordinary Differential Equations to Linearizable form by Contact Transformations

Electrochemical Reduction in Alkaline Electrolyte Removes CuS Phase to Form CuInS 2 -Based Solar Cells

PRODUCT DESIGN FOR FORM

Handwritten Numeral Superposition to Printed Form Using Convolutional Auto-Encoder and Recognition Using Convolutional N

Using the SSA-Form in a Code Generator