MPEG-2 Compressed-Domain Algorithms for Video Analysis

  • PDF / 1,588,289 Bytes
  • 11 Pages / 600.03 x 792 pts Page_size
  • 112 Downloads / 223 Views

DOWNLOAD

REPORT


MPEG-2 Compressed-Domain Algorithms for Video Analysis Wolfgang Hesseler and Stefan Eickeler Fraunhofer IMK, Schloss Birlinghoven, 53754 Sankt Augustin, Germany Received 1 September 2004; Revised 2 June 2005; Accepted 6 June 2005 This paper presents new algorithms for extracting metadata from video sequences in the MPEG-2 compressed domain. Three algorithms for efficient low-level metadata extraction in preprocessing stages are described. The first algorithm detects camera motion using the motion vector field of an MPEG-2 video. The second method extends the idea of motion detection to a limited region of interest, yielding an efficient algorithm to track objects inside video sequences. The third algorithm performs a cut detection using macroblock types and motion vectors. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION

The demand for indexing techniques capable of handling increasing amounts of video at low cost can only be satisfied using automatic methods for metadata extraction. The semantics of metadata range from low level over mid-level to high level. Low-level metadata are information that can be extracted more or less directly from the video signal. Typical examples of low-level metadata are the color histogram of an image or spectrogram features of an audio waveform. In most cases, the low-level features are used for simple query by example applications or as a preprocessing step to generate mid- and high-level information. Mid-level metadata are more understandable for humans and can be derived from the images of a video sequence using pattern classification methods. Examples are the location of human faces in video sequences or the spoken words in the audio channel. Highlevel metadata give a comprehensive semantic description of the data. General methods to create high-level metadata are still under development, but techniques for automatic generation of high-level metadata are highly advanced for special cases like face recognition which gives the name of a person shown in the video sequence, or the topic recognition which gives the type of subject being discussed in the audio. Usually the level of generated metadata increases within the processing chain, which starts with low-level preprocessing and ends with high-level methods. Generally, computational complexity increases with the level of metadata. It is essential to have efficient methods at all levels of processing.

MPEG-7 is a common standard to store the metadata information that is generated by many automatic videoanalysis systems. A video-analysis system that automatically produces metadata conforming the MPEG-7 standard [1] is the iFinder, developed by the Fraunhofer IMK. The algorithms presented in this paper are included as a new module into the iFinder, extend its analytic capacity with additional metadata such as camera motion, and speed up the existing methods. Most digital video in high quality, such as DVD or DVB (digital television), is encoded in the MPEG-2 compression standard. MPEG is an ISO stand