Performance-driven parallel reconfigurable computing architecture for multi-standard video decoding

  • PDF / 1,215,567 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 107 Downloads / 201 Views

DOWNLOAD

REPORT


Performance-driven parallel reconfigurable computing architecture for multi-standard video decoding Chi-Chou Kao 1 Received: 14 August 2019 / Revised: 28 July 2020 / Accepted: 31 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Video processing applications often need high computing capacity but have performance and power constraints, especially in portable devices. General purpose processors can no longer meet the requirements. This paper presents a parallel reconfigurable computing architecture consisting of reconfigurable processing units connected by an area-efficient routing. The hierarchical configuration contexts can cut the implementation overhead and the energy dissipation spent on fast reconfiguration. The proposed architecture targets multiple-standard video processing. The design is able to give high performance comparable to the fixed-function ASIC through deep pipelining and a large amount of computing parallelism. The experimental results show the proposed architecture has great performance and practicability. Keywords Reconfigurable processing . Performance . Power . Parallel . Multiple-standard video processing

1 Introduction In the multimedia applications of embedded devices, we use codec to adjusted different requirements and conditions, including transmission speed, delay, and bandwidth, image quality requirements, resolution, color. The bit rate, the picture update rate per second (FPS), and so on for image storage and transmission. Various standards have been developed such as H.263, H.264, H.265, MPEG-2, MPEG-4, and AVS [1, 4, 20]. Figure 1 shows the general architecture for various video decoding. First, image compression converts color from RGB to YcbCr and divides into blocks for motion compensation and discrete cosine transform. Next, quantization (entropy) and entropy coding generate output. Finally, we can get the video

* Chi-Chou Kao [email protected]

1

Department of Computer Science and Information Engineering, National University of Tainan, Tainan Taiwan

Multimedia Tools and Applications

decoding based on the reverse operation of compression. The + symbol is usually used to indicate that two or more parts should be added together. Obviously, the video codec requires a large computation, fast calculation speed, and real-time. With the rapid development of mobile devices, we must improve computing power and performance to meet energy consumption, cost, and flexibility. One of the important methods of video processing is the computational parallelism of data streaming, which distributes the computational work to multiple processing elements. In many cases, computational work involves stream processing that can be broken down into multiple phases. These phases can be overlaid into multiple computing resources to process different dataset pipes in parallel. Based on the reasons, it is a key requirement of computing architectures in which data flow applications should be designed to use efficiently and map the parallelism of available ha