Efficient Implementation of Nested-Loop Multimedia Algorithms

  • PDF / 1,024,507 Bytes
  • 18 Pages / 600 x 792 pts Page_size
  • 53 Downloads / 174 Views

DOWNLOAD

REPORT


fficient Implementation of Nested-Loop Multimedia Algorithms Surin Kittitornkun Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53706, USA Email: [email protected]

Yu Hen Hu Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53706, USA Email: [email protected] Received 22 June 2001 and in revised form 22 August 2001 A novel dependence graph representation called the multiple-order dependence graph for nested-loop formulated multimedia signal processing algorithms is proposed. It allows a concise representation of an entire family of dependence graphs. This powerful representation facilitates the development of innovative implementation approach for nested-loop formulated multimedia algorithms such as motion estimation, matrix-matrix product, 2D linear transform, and others. In particular, algebraic linear mapping (assignment and scheduling) methodology can be applied to implement such algorithms on an array of simple-processing elements. The feasibility of this new approach is demonstrated in three major target architectures: application-specific integrated circuit (ASIC), field programmable gate array (FPGA), and a programmable clustered VLIW processor. Keywords and phrases: dependence graph, systolic array, multiple-order, FPGA, VLIW.

1. INTRODUCTION Many data-intensive multimedia algorithms such as motion estimation, 2D DCT/IDCT, matrix multiplication, and others consist of mainly deep nested-loops with relatively simpleloop body. They have been painstakingly implemented manually as hardwired ASICs (application specific integrated circuits) [1, 2] in order to meet the extremely high throughput rate demand of video signal processing algorithms. In view of the repetitive nature of the nested-loop algorithm formulation, the most effective strategy to speed up computation is to realize such an algorithm using an array of processing elements (PEs) to facilitate parallel processing. In 1967, Karp et al. [3] introduced the notion of uniform recurrence equation as a powerful abstraction to describe nested-loop algorithm formulations. Lamport [4] considered the parallel execution of Do loops and proposed to use the algebraic construct of a hyperplane in the iteration index space to characterize the iterations that can be executed simultaneously. In 1982, Kung [5] and Leiserson [6] used the term systolic array to describe the pipelined execution of nestedloop formulated algorithms in a regular-structured, locally connected array of identical PEs. It has been argued that a systolic array structure is amenable for very large scale integration circuit (VLSI) implementation as it exploits maximum

parallelism while requiring only local communication among processors. In the 1980s, Kung [7] and others [8, 9] have developed systematic design methodologies for systolic arrays. These methods facilitate algebraic mapping of the iteration indices of a nested-loop onto a systolic array with a linear task assignment and a linear schedule implementation. With