Efficient Floating-Point Givens Rotation Unit
- PDF / 1,076,550 Bytes
- 24 Pages / 439.37 x 666.142 pts Page_size
- 83 Downloads / 209 Views
Efficient Floating-Point Givens Rotation Unit Javier Hormigo1
· Sergio D. Muñoz1
Received: 11 September 2019 / Revised: 19 October 2020 / Accepted: 22 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput. Keywords Signal processing · Advanced communication · QR Decomposition · Floating-point · HUB format · High-throughput · CORDIC
1 Introduction Nowadays, most advanced signal processing applications, such as wireless communications, graphics, industrial control, and medical imaging, strongly rely on linear algebra algorithms. Some of these algorithms require performing QR Decomposition (QRD) [3,10,26]. QRD decomposes an input matrix Am×n into two new matrices, Q m×m and R m×n , whose product is equal to A. Furthermore, it is fulfilled R m×n is an upper triangular matrix, and Q m×m is an orthogonal matrix. QRD or QR factorization is a very compute-intensive operation that requires high-throughput in many applica-
B
Javier Hormigo [email protected] Sergio D. Muñoz [email protected]
1
The Department of Computer Architecture, Universidad de Málaga, Málaga 29071, Spain
Circuits, Systems, and Signal Processing
tions. Due to that reason, many researchers have investigated the implementation of QRD on hardware for embedded systems. There are several methods to compute QRD, but the Givens Rotation Method (and its variations) is probably the most widely used to implement QRD for embedded systems. This is due to its robust numerical properties and its easy parallelization. The Givens Rotation Method is based on a unitary transformation, called Givens rotation, which allows inserting a zero element at a selected location of a matrix. Then, following a predefined schedule, the input matrix is transformed into an upper triangular matrix R by successive Givens rotations, whereas the same rotations over the identity matrix produce an orthogonal matrix Q. To perform each Givens rotation, first, the rotation angle θ , which allows zeroing an element, has to be computed by using the first nonzero pair of elements of the two target rows. Then, all pairs of element
Data Loading...