A fast cellular method of matrix multiplication

PDF / 95,250 Bytes
5 Pages / 595.276 x 793.701 pts Page_size
50 Downloads / 233 Views

A FAST CELLULAR METHOD OF MATRIX MULTIPLICATION

UDC 681.322.012

L. D. Jelfimova

This paper proposes a cellular method of matrix multiplication. The method reduces the multiplicative and additive complexities of well-known matrix multiplication algorithms by 12.5%. The computational complexities of cellular analogs of such algorithms are estimated. A fast cellular analog is presented whose multiplicative and additive complexities are equal to » 0. 382n 3 multiplications and »1. 147n 3 additions, respectively, where n is the order of the matrices being multiplied. Keywords: cellular methods of linear algebra, matrix multiplication algorithm, parallel computations.

The solution of the problem of embedding algorithms of arbitrary sizes into architectures of concrete computing systems is based on the decomposition of a large-scale computational process into a set of interconnected subprocesses of smaller sizes at different levels of representation of computations. A decomposition of a computational process at the level of initial algorithms that produces cellular algorithms whose basic operations are operations over matrix cells is potentially most promising. Cellular algorithms are well known [1, 2] that require the execution of only two types of operations, namely, the basic operation of matrix multiplication with addition AB + D = C (where A, B, C, and D are square matrices of cell size) and nonstandard operations that constitute only a small percentage of the total number of operations of an algorithm. Such cellular algorithms for the solution of practically all problems of linear algebra and many problems of computational mathematics are realized on various computing systems and also on special processors including fast calculators for execution of the cellular operation AB + D = C that can be realized using well-known fast algorithms of matrix multiplication [3–5]. The realization of the basic operation on VLSI-calculators, in particular, on processor arrays with a systolic organization of computations is most efficient. At present, based on an integrated approach to the design of the mentioned arrays [6], VLSI-oriented versions of fast Winograd [3] and Strassen [4] algorithms and a new fast algorithm [5] are developed and also fast systolic arrays of various architectures [6–8] are synthesized that efficiently realize the above-mentioned VLSI-algorithms of matrix multiplication. Cellular algorithms [9–12] are also well known that are obtained by the replacement of scalar operations over numbers by operations over cells of matrices or such a reordering of sets of scalar operations that makes it possible to determine macrooperations over matrix cells. If an algorithm allows for a freedom of choice of the scale of parallelism, then the passage to a large-scale parallelism at the level of cellular operations allows one to achieve a higher degree of parallelization of computations, equipment workload, and a decrease in the expenditure for data exchange. In particular, for example, to realize the fast Strassen algo

Data Loading...

A fast cellular method of matrix multiplication

Recommend Documents

Fast hybrid matrix multiplication algorithms

A New Fast Recursive Matrix Multiplication Algorithm

New fast hybrid matrix multiplication algorithms

All Pairs Shortest Paths via Matrix Multiplication

Inexact Sparse Matrix Vector Multiplication in Krylov Subspace Methods: An Application-Oriented Reduction Method

Parallel Matrix Transposition and Vector Multiplication Using OpenMP

HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs

Maliciously Secure Matrix Multiplication with Applications to Private Deep Learning

An initialization method to improve the training time of matrix factorization algorithm for fast recommendation

A Fast Filtering Method of Invalid Information in XML File

A New Method for Fast Simulation of Adsorbate Dynamics

Validation of a fast method for quantifying left ventricular torsion