Real-Time Adaptive Foreground/Background Segmentation

  • PDF / 2,968,333 Bytes
  • 13 Pages / 600 x 792 pts Page_size
  • 28 Downloads / 207 Views

DOWNLOAD

REPORT


Real-Time Adaptive Foreground/Background Segmentation Darren E. Butler Information Security Institute, Queensland University of Technology, Brisbane QLD 4001, Australia Email: [email protected]

V. Michael Bove Jr. Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Email: [email protected]

Sridha Sridharan Information Security Institute, Queensland University of Technology, Brisbane QLD 4001, Australia Email: [email protected] Received 2 January 2004; Revised 8 November 2004 The automatic analysis of digital video scenes often requires the segmentation of moving objects from a static background. Historically, algorithms developed for this purpose have been restricted to small frame sizes, low frame rates, or offline processing. The simplest approach involves subtracting the current frame from the known background. However, as the background is rarely known beforehand, the key is how to learn and model it. This paper proposes a new algorithm that represents each pixel in the frame by a group of clusters. The clusters are sorted in order of the likelihood that they model the background and are adapted to deal with background and lighting variations. Incoming pixels are matched against the corresponding cluster group and are classified according to whether the matching cluster is considered part of the background. The algorithm has been qualitatively and quantitatively evaluated against three other well-known techniques. It demonstrated equal or better segmentation and proved capable of processing 320 × 240 PAL video at full frame rate using only 35%–40% of a 1.8 GHz Pentium 4 computer. Keywords and phrases: video segmentation, background segmentation, real-time video processing.

1.

INTRODUCTION

As humans we possess an innate ability to decompose arbitrary scenes and with only a casual glance we can recognise a multitude of shapes, shades, and textures. In contrast, computers require enormous amounts of processing power and frequently fail if, for instance, the sun hides behind a cloud. As a consequence, practical solutions often rely upon domain-specific knowledge to make the problem tractable. For instance, if we know that we are compressing a head-andshoulders sequence, then we also know that most of the information pertains to the participant. Hence, we can employ differential bit allocation and encode their facial expressions and gestures with a higher quality than the background. Alternatively, we could try to fit a parameterised model to the participant [1] or derive one from them [2] and thereby obtain even greater compression. In either case, the first step is to segment the participant from the background and we can exploit their movements to help us do so. Motion is a particularly important cue for computer vision. Indeed, for many applications, the simple fact that

something is moving makes it of interest and anything else can be ignored. In such cases, it is common for moving objects to be referred to as the foreground and stationary objects as the background. A clas