Efficient Floating-Point Givens Rotation Unit

PDF / 1,076,550 Bytes
24 Pages / 439.37 x 666.142 pts Page_size
83 Downloads / 222 Views

Efficient Floating-Point Givens Rotation Unit Javier Hormigo1

· Sergio D. Muñoz1

Received: 11 September 2019 / Revised: 19 October 2020 / Accepted: 22 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput. Keywords Signal processing · Advanced communication · QR Decomposition · Floating-point · HUB format · High-throughput · CORDIC

1 Introduction Nowadays, most advanced signal processing applications, such as wireless communications, graphics, industrial control, and medical imaging, strongly rely on linear algebra algorithms. Some of these algorithms require performing QR Decomposition (QRD) [3,10,26]. QRD decomposes an input matrix Am×n into two new matrices, Q m×m and R m×n , whose product is equal to A. Furthermore, it is fulfilled R m×n is an upper triangular matrix, and Q m×m is an orthogonal matrix. QRD or QR factorization is a very compute-intensive operation that requires high-throughput in many applica-

B

Javier Hormigo [email protected] Sergio D. Muñoz [email protected]

1

The Department of Computer Architecture, Universidad de Málaga, Málaga 29071, Spain

Circuits, Systems, and Signal Processing

tions. Due to that reason, many researchers have investigated the implementation of QRD on hardware for embedded systems. There are several methods to compute QRD, but the Givens Rotation Method (and its variations) is probably the most widely used to implement QRD for embedded systems. This is due to its robust numerical properties and its easy parallelization. The Givens Rotation Method is based on a unitary transformation, called Givens rotation, which allows inserting a zero element at a selected location of a matrix. Then, following a predefined schedule, the input matrix is transformed into an upper triangular matrix R by successive Givens rotations, whereas the same rotations over the identity matrix produce an orthogonal matrix Q. To perform each Givens rotation, first, the rotation angle θ , which allows zeroing an element, has to be computed by using the first nonzero pair of elements of the two target rows. Then, all pairs of element

Data Loading...

Efficient Floating-Point Givens Rotation Unit

Recommend Documents

Rotation

Rotation

Design of an Area Efficient and Low Power MAC Unit

Energy-Efficient Ternary Arithmetic Logic Unit Design in CNTFET Technology

Efficient Packings of Unit Squares in a Large Square

JOB ROTATION

Rotation Estimation

Varimax Rotation

Imperial Rotation

UNIT

Rotation Angle

Efficient Relative Pose Estimation for Cameras and Generalized Cameras in Case of Known Relative Rotation Angle