Pipeline synthesis and optimization of FPGA-based video processing applications with CAL

  • PDF / 1,203,838 Bytes
  • 28 Pages / 595.28 x 793.7 pts Page_size
  • 85 Downloads / 177 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Pipeline synthesis and optimization of FPGAbased video processing applications with CAL Ab Al-Hadi Ab Rahman*, Anatoly Prihozhy and Marco Mattavelli

Abstract This article describes a pipeline synthesis and optimization technique that increases data throughput of FPGAbased system using minimum pipeline resources. The technique is applied on CAL dataflow language, and designed based on relations, matrices, and graphs. First, the initial as-soon-as-possible (ASAP) and as-late-aspossible (ALAP) schedules, and the corresponding mobility of operators are generated. From this, operator coloring technique is used on conflict and nonconflict directed graphs using recursive functions and explicit stack mechanisms. For each feasible number of pipeline stages, a pipeline schedule with minimum total register width is taken as an optimal coloring, which is then automatically transformed to a description in CAL. The generated pipelined CAL descriptions are finally synthesized to hardware description languages for FPGA implementation. Experimental results of three video processing applications demonstrate up to 3.9× higher throughput for pipelined compared to non-pipelined implementations, and average total pipeline register width reduction of up to 39.6 and 49.9% between the optimal, and ASAP and ALAP pipeline schedules, respectively. 1 Introduction Data throughput is one of the most important parameters in video processing systems. It is essentially a measure of how fast data passes from input to output of a system. With increasing demands for larger resolution images, faster frame rates, and more processing requirements through advanced algorithms, it is becoming a major challenge to meet the ever-increasing desirable throughput. For algorithms that can be performed in parallel, such as the case with most digital signal processing (DSP) applications, parallel platforms such as multi-core CPU, many-core GPU, and FPGA generally results in higher throughput compared to traditional single-core systems. Among these parallel platforms, FPGA systems allow the most parallel operations with the highest flexibility for programming parallel cores. However, register transfer level (RTL) designs for FPGA are known to be difficult and time consuming, especially for complex algorithms [1]. As time-to-market window continues to shrink, a new high-level program that synthesizes to efficient parallel hardware is required to manage complexity and increase productivity. * Correspondence: [email protected] SCI-STI-MM, Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

The CAL dataflow language [2] was developed to address these issues, specifically with a goal to synthesize high-level programs into efficient parallel hardware (see Section 3.2). CAL is an actor language in which program executes based on tokens; therefore, suitable for data intensive algorithms such as in DSP that operates on multiple data. The language was also chosen by the ISO/IECa as a language for the description and specificatio