A Multi-scale CNN for Affordance Segmentation in RGB Images

Given a single RGB image our goal is to label every pixel with an affordance type. By affordance, we mean an object’s capability to readily support a certain human action, without requiring precursor actions. We focus on segmenting the following five affo

PDF / 3,635,326 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
73 Downloads / 219 Views

DOWNLOAD

REPORT

Abstract. Given a single RGB image our goal is to label every pixel with an aﬀordance type. By aﬀordance, we mean an object’s capability to readily support a certain human action, without requiring precursor actions. We focus on segmenting the following ﬁve aﬀordance types in indoor scenes: ‘walkable’, ‘sittable’, ‘lyable’, ‘reachable’, and ‘movable’. Our approach uses a deep architecture, consisting of a number of multiscale convolutional neural networks, for extracting mid-level visual cues and combining them toward aﬀordance segmentation. The mid-level cues include depth map, surface normals, and segmentation of four types of surfaces – namely, ﬂoor, structure, furniture and props. For evaluation, we augmented the NYUv2 dataset with new ground-truth annotations of the ﬁve aﬀordance types. We are not aware of prior work which starts from pixels, infers mid-level cues, and combines them in a feed-forward fashion for predicting dense aﬀordance maps of a single RGB image.

Keywords: Object aﬀordance

1

· Mid-level cues · Deep learning

Introduction

This paper addresses the problem of aﬀordance segmentation in an image, where the goal is to label every pixel with an aﬀordance type. By aﬀordance, we mean an object’s capability to support a certain human action [1,2]. For example, when a surface in the scene aﬀords the opportunity for a person to walk, sit or lie down on it, we say that the surface is characterized by aﬀordance types ‘walkable’, ‘sittable’, or ‘lyable’. Also, an object may be ‘reachable’ when someone standing on the ﬂoor can readily grasp the object. A surface or an object may be characterized by a number of aﬀordance types. Importantly, aﬀordance of an object exhibits only the possibility of some action, subject to the object’s relationships with the environment, and thus is not an inherent (permanent) object’s attribute. Thus, sometimes chairs are not ‘sittable’ and ﬂoors are not ‘walkable’ if other objects in the environment prevent performing the corresponding actions. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 12) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 186–201, 2016. DOI: 10.1007/978-3-319-46493-0 12

A Multi-scale CNN for Aﬀordance Segmentation in RGB Images

187

Aﬀordance segmentation is an important, long-standing problem with a range of applications, including robot navigation, path planning, and autonomous driving [3–14]. Reasoning about aﬀordances has been shown to facilitate object and action recognition [4,10,13]. Existing work typically leverages mid-level visual cues [3] for reasoning about spatial (and temporal) relationships among objects in the scene, which is then used for detection (and in some cases segmentation) of aﬀordances in the image (or video). For example, Hoiem et al. [15,16] show that inferring mid-level cues – including: depth map, semantic cues, and occlusion maps – facilita

Data Loading...

A Multi-scale CNN for Affordance Segmentation in RGB Images

Recommend Documents

Vertebra Segmentation for Clinical CT Images Using Mask R-CNN

Multiscale Segmentation of Polarimetric SAR Data Using Pauli Analysis Images

Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs

A CNN Cascade for Landmark Guided Semantic Part Segmentation

Multiscale Boundary Identification for Medical Images

RGB Images Driven Recognition of Grapevine Varieties

Affordance Lost, Affordance Regained, and Affordance Surrendered

CNN-GCN Aggregation Enabled Boundary Regression for Biomedical Image Segmentation

Affordance

Segmentation-Assisted Registration for Brain MR Images

Efficient CNN-CRF Network for Retinal Image Segmentation

A Deep Learning Approach for Automatic Segmentation of Dental Images