Context Driven Focus of Attention for Object Detection

Context plays an important role in general scene perception. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this information. We tackle this problem by presenting a conc

PDF / 3,979,828 Bytes
18 Pages / 430 x 660 pts Page_size
93 Downloads / 253 Views

DOWNLOAD

REPORT

Abstract. Context plays an important role in general scene perception. In particular, it can provide cues about an object’s location within an image. In computer vision, object detectors typically ignore this information. We tackle this problem by presenting a concept of how to extract and learn contextual information from examples. This context is then used to calculate a focus of attention, that represents a prior for object detection. State-of-the-art local appearance-based object detection methods are then applied on selected parts of the image only. We demonstrate the performance of this approach on the task of pedestrian detection in urban scenes using a demanding image database. Results show that context awareness provides complementary information over pure local appearance-based processing. In addition, it cuts down the search complexity and increases the robustness of object detection.

1

Introduction

In the real world there exists a strong relationship between the environment and the objects that can be found within it. Experiments with scene perception, interpretation, and understanding have shown that the human visual system extensively uses these relationships to make object detection and recognition more reliable [1,2,3]. In the proper context, humans can identify a given object in a scene, even if they would not normally recognize the same object when it is presented in isolation. The limitation of a local appearance being too vague is resolved by using contextual information and by applying a reasoning mechanism to identify the object of interest. An example is shown in Fig. 1, where most people have little trouble in recognizing the marked objects in the image. However, shown in isolation, an indisputable recognition of these patches is not easily achieved. In general, context plays a useful role in object detection in at least two ways. First, it helps detection when local intrinsic information about the object is insuﬃcient. Second, even when local appearance-based object detection is possible, the search space can be cut down by attending to image regions where the occurrence of the objects of interest is most likely. For example, when searching for manholes in Fig. 1 the search can be constrained to the ground plane. Even though object detection is a well established discipline in computer vision and is used in a large number of applications, contextual information is L. Paletta and E. Rome (Eds.): WAPCV 2007, LNAI 4840, pp. 216–233, 2007. c Springer-Verlag Berlin Heidelberg 2007

Context Driven Focus of Attention for Object Detection

217

Fig. 1. The object hypothesis formed from local appearance is rather weak for a unique object recognition. Using the surroundings of the patches signiﬁcantly aids recognition.

typically ignored. Many concepts for object detection have been developed where, independent of the particular representation model used, the employed object detector is based on local appearance alone (see e.g. [4] for a review). Standard representation models are bag-of-feature

Data Loading...

Context Driven Focus of Attention for Object Detection

Recommend Documents

Recursive Context Routing for Object Detection

Pyramid context learning for object detection

Learning Where to Focus for Efficient Video Object Detection

Hierarchical Context Embedding for Region-Based Object Detection

Real-Time Object Detection Based on Convolutional Block Attention Module

Attention: The Focus of Consciousness

Focus-Plus-Context

Improved SSD for Object Detection

Mixture Models for Object Detection

Inception Parallel Attention Network for Small Object Detection in Remote Sensing Images

STA-Net: spatial-temporal attention network for video salient object detection

VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search