Motion Estimation and Signaling Techniques for 2D+t Scalable Video Coding

  • PDF / 1,160,124 Bytes
  • 21 Pages / 600.03 x 792 pts Page_size
  • 54 Downloads / 198 Views

DOWNLOAD

REPORT


Motion Estimation and Signaling Techniques for 2D+t Scalable Video Coding M. Tagliasacchi, D. Maestroni, S. Tubaro, and A. Sarti Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32 20133 Milano, Italy Received 1 March 2005; Revised 5 August 2005; Accepted 12 September 2005 We describe a fully scalable wavelet-based 2D+t (in-band) video coding architecture. We propose new coding tools specifically designed for this framework aimed at two goals: reduce the computational complexity at the encoder without sacrificing compression; improve the coding efficiency, especially at low bitrates. To this end, we focus our attention on motion estimation and motion vector encoding. We propose a fast motion estimation algorithm that works in the wavelet domain and exploits the geometrical properties of the wavelet subbands. We show that the computational complexity grows linearly with the size of the search window, yet approaching the performance of a full search strategy. We extend the proposed motion estimation algorithm to work with blocks of variable sizes, in order to better capture local motion characteristics, thus improving in terms of rate-distortion behavior. Given this motion field representation, we propose a motion vector coding algorithm that allows to adaptively scale the motion bit budget according to the target bitrate, improving the coding efficiency at low bitrates. Finally, we show how to optimally scale the motion field when the sequence is decoded at reduced spatial resolution. Experimental results illustrate the advantages of each individual coding tool presented in this paper. Based on these simulations, we define the best configuration of coding parameters and we compare the proposed codec with MC-EZBC, a widely used reference codec implementing the t+2D framework. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION

Today’s video streaming applications require codecs to provide a bitstream that can be flexibly adapted to the characteristics of the network and the receiving device. Such codecs are expected to fulfill the scalability requirements so that encoding is performed only once, while decoding takes place each time at different spatial resolutions, frame rates, and bitrates. Consider for example streaming a video content to TV sets, PDAs, and cellphones at the same time. Obviously each device has its own constraints in terms of bandwidth, display resolution, and battery life. For this reason it would be useful for the end users to subscribe to a scalable video stream in such a way that a representation of the video content matching the device characteristics can be extracted at decoding time. Wavelet-based video codecs have proved to be able to naturally fit this application scenario, by decomposing the video sequence into a plurality of spatio-temporal subbands. Combined with an embedded entropy coding of wavelet coefficients such as JPEG2000 [1], SPIHT (set partitioning in hierarchical trees) [2], EZBC (embedded zeroblock codin