Efficient Implementation of Nested-Loop Multimedia Algorithms

PDF / 1,024,507 Bytes
18 Pages / 600 x 792 pts Page_size
53 Downloads / 223 Views

fficient Implementation of Nested-Loop Multimedia Algorithms Surin Kittitornkun Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53706, USA Email: [email protected]

Yu Hen Hu Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53706, USA Email: [email protected] Received 22 June 2001 and in revised form 22 August 2001 A novel dependence graph representation called the multiple-order dependence graph for nested-loop formulated multimedia signal processing algorithms is proposed. It allows a concise representation of an entire family of dependence graphs. This powerful representation facilitates the development of innovative implementation approach for nested-loop formulated multimedia algorithms such as motion estimation, matrix-matrix product, 2D linear transform, and others. In particular, algebraic linear mapping (assignment and scheduling) methodology can be applied to implement such algorithms on an array of simple-processing elements. The feasibility of this new approach is demonstrated in three major target architectures: application-speciﬁc integrated circuit (ASIC), ﬁeld programmable gate array (FPGA), and a programmable clustered VLIW processor. Keywords and phrases: dependence graph, systolic array, multiple-order, FPGA, VLIW.

1. INTRODUCTION Many data-intensive multimedia algorithms such as motion estimation, 2D DCT/IDCT, matrix multiplication, and others consist of mainly deep nested-loops with relatively simpleloop body. They have been painstakingly implemented manually as hardwired ASICs (application speciﬁc integrated circuits) [1, 2] in order to meet the extremely high throughput rate demand of video signal processing algorithms. In view of the repetitive nature of the nested-loop algorithm formulation, the most effective strategy to speed up computation is to realize such an algorithm using an array of processing elements (PEs) to facilitate parallel processing. In 1967, Karp et al. [3] introduced the notion of uniform recurrence equation as a powerful abstraction to describe nested-loop algorithm formulations. Lamport [4] considered the parallel execution of Do loops and proposed to use the algebraic construct of a hyperplane in the iteration index space to characterize the iterations that can be executed simultaneously. In 1982, Kung [5] and Leiserson [6] used the term systolic array to describe the pipelined execution of nestedloop formulated algorithms in a regular-structured, locally connected array of identical PEs. It has been argued that a systolic array structure is amenable for very large scale integration circuit (VLSI) implementation as it exploits maximum

parallelism while requiring only local communication among processors. In the 1980s, Kung [7] and others [8, 9] have developed systematic design methodologies for systolic arrays. These methods facilitate algebraic mapping of the iteration indices of a nested-loop onto a systolic array with a linear task assignment and a linear schedule implementation. With

Data Loading...

Efficient Implementation of Nested-Loop Multimedia Algorithms

Recommend Documents

Efficient implementation of quantum arithmetic operation circuits

Design and Implementation of Multimedia Social Services on Elgg

Low-Rank Approximation Algorithms, Implementation, Applications

Low Rank Approximation Algorithms, Implementation, Applications

Efficient Procedures for Using Matrix Algorithms

Efficient Predictive Algorithms for Image Compression

Efficient Algorithms for Approximate Smooth Selection

Efficient Graded Teaching Scheme Based on Multimedia Environment

Load Prediction Energy Efficient VM Consolidation Policy in Multimedia Cloud

Energy-Efficient Mobile Middleware for SIP on Ubiquitous Multimedia Systems

An Efficient Privacy Scheme Based on Smart Multimedia Devices

Efficient FPGA Implementation of FIR Filter Using Distributed Arithmetic