Functional Object Class Detection Based on Learned Affordance Cues

Current approaches to visual object class detection mainly focus on the recognition of basic level categories, such as cars, motorbikes, mugs and bottles. Although these approaches have demonstrated impressive performance in terms of recognition, their re

  • PDF / 2,341,171 Bytes
  • 10 Pages / 430 x 660 pts Page_size
  • 62 Downloads / 208 Views

DOWNLOAD

REPORT


2

Computer Science Department, TU Darmstadt, Germany {stark,lies,schiele}@informatik.tu-darmstadt.de School of Computer Science, University of Birmingham, United Kingdom {mxz,jlw}@cs.bham.ac.uk

Abstract. Current approaches to visual object class detection mainly focus on the recognition of basic level categories, such as cars, motorbikes, mugs and bottles. Although these approaches have demonstrated impressive performance in terms of recognition, their restriction to these categories seems inadequate in the context of embodied, cognitive agents. Here, distinguishing objects according to functional aspects based on object affordances is important in order to enable manipulation of and interaction between physical objects and cognitive agent. In this paper, we propose a system for the detection of functional object classes, based on a representation of visually distinct hints on object affordances (affordance cues). It spans the complete range from tutordriven acquisition of affordance cues, learning of corresponding object models, and detecting novel instances of functional object classes in real images. Keywords: Functional object categories, object affordances, object category detection, object recognition.

1

Introduction and Related Work

In recent years, computer vision has made tremendous progress in the field of object category detection. Diverse approaches based on local features, such as simple bag-of-words methods [2] have shown impressive results for the detection of a variety of different objects. More recently, adding spatial information has resulted in a boost in performance [10], and combining different cues has even further pushed the limits. One of the driving forces behind object category detection is a widely-adopted collection of publicly available data sets [3,7], which is considered an important instrument for measuring and comparing the detection performance of different methods. The basis for comparison is given by a set of rather abstract, basic level categories [15]. These categories are grounded in cognitive psychology, and category instances typically share characteristic visual properties. In the context of embodied cognitive agents, however, different criteria for the formation of categories seem more appropriate. Ideally, an embodied, cognitive A. Gasteratos, M. Vincze, and J.K. Tsotsos (Eds.): ICVS 2008, LNCS 5008, pp. 435–444, 2008. c Springer-Verlag Berlin Heidelberg 2008 

436

M. Stark et al.

Fig. 1. Basic level (left) vs functional (right) object categories

agent (an autonomous robot, e.g.), would be capable of categorizing and detecting objects according to potential uses, and w.r.t. their utility in performing a certain task. This functional definition of object categories is related to the notion of affordances pioneered by [6]. Fig. 1 exemplifies the differentiation between functional and basic level categories, and highlights the following two key properties: 1) functional categories may generalize across and beyond basic level categories (both a mug and a watering-can are handle-gra