A high throughput two-dimensional discrete cosine transform and MPEG4 motion estimation using vector coprocessor
- PDF / 1,086,400 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 96 Downloads / 201 Views
ORIGINAL RESEARCH PAPER
A high throughput two‑dimensional discrete cosine transform and MPEG4 motion estimation using vector coprocessor Shahrukh Agha1 · Usman Ali Gulzari2 · Farzana Shaheen3 · Farmanullah Jan4 Received: 16 October 2018 / Accepted: 15 June 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract In this work a configurable and scalable vector coprocessor for real time processing of MPEG4 motion estimation (ME) and two-dimensional DCT (2D DCT) is presented. A sequential DSP processor based on a reduced instruction set computer (RISC) processor architecture would require a frequency of 15 GHz for the real time processing of these two processes for a common intermediate format (CIF) sized sequence at 25 frames per second (fps). This frequency requirement will increase further if the image dimensions are increased. On the other hand our architecture on FPGA can achieve the real time processing rate at low frequency for CIF sized sequence and at higher frequency for full high definition (FHD) sequence for combined ME and 2D DCT. Due to configurable nature of the architecture and FPGA, this can be extended to higher dimensional image sequences. An important aspect of the architecture is that same datapath that is used for ME is also used for 2D DCT, with minor modification, leading to saving in area and time consumption. In addition the processor–coprocessor architecture has lower energy consumption and cost than the sequential processor. Keywords Motion estimation · Two dimensional discrete cosine transformation · Vector coprocessor · FPGA · High throughput
1 Introduction Real time Dynamic Instruction Count (DIC) of Motion Estimation (ME) [1] process in MPEG (Moving Picture Experts Group) compression standard corresponding to a frame rate of 25 CIF ( 352 × 288 pixel ) sized frames per second on a sequential processor involve billions of arithmetic operations and millions of memory accesses [1–4]. Achieving this throughput, for an ordinary sequential processor, is difficult for two main reasons. One is external memory access delay and second is high power consumption [1]. For battery powered applications low power consumption becomes * Shahrukh Agha [email protected] 1
Department of Electrical Engineering, COMSATS University, Islamabad, Pakistan
2
Department of Electrical Engineering, University of Lahore, Islamabad Campus, Islamabad, Pakistan
3
Department of Physics, COMSATS University, Islamabad, Pakistan
4
Department of Computer Science, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
important especially when the application running has real time constraint. Several VLSI architectures and DSP based systems for ME have been presented in the literature [1–23] to achieve the required throughput. In MPEG4 compression standard, after motion estimation the second most computationally complex part is two dimensional discrete cosine transformation (DCT) and inverse discrete cosine transformation [24–31] followed by variable length encoding or a
Data Loading...