Object affordance detection with relationship-aware network

  • PDF / 1,736,500 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 87 Downloads / 246 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

EXTREME LEARNING MACHINE AND DEEP LEARNING NETWORKS

Object affordance detection with relationship-aware network Xue Zhao1



Yang Cao1



Yu Kang1

Received: 30 November 2018 / Accepted: 28 June 2019  Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract Object affordance detection, which aims to understand functional attributes of objects, is of great significance for an autonomous robot to achieve a humanoid object manipulation. In this paper, we propose a novel relationship-aware convolutional neural network, which takes the symbiotic relationship between multiple affordances and the combinational relationship between the affordance and objectness into consideration, to predict the most probable affordance label for each pixel in the object. Different from the existing CNN-based methods that rely on separate and intermediate object detection step, our proposed network directly produces the pixel-wise affordance maps from an input image in an end-toend manner. Specifically, there are three key components in our proposed network: Coord-ASPP module introducing CoordConv in atrous spatial pyramid pooling (ASPP) to refine the feature maps, relationship-aware module linking the affordances and corresponding objects to explore the relationships, and online sequential extreme learning machine auxiliary attention module focusing on individual affordances further to assist relationship-aware module. The experimental results on two public datasets have shown the merits of each module and demonstrated the superiority of our relationship-aware network against the state of the arts. Keywords Object affordance detection  Convolutional neural network  Relationship-aware  Online sequential extreme learning machine

1 Introduction Affordances or functional attributes of objects are defined as the latent ‘‘action possibilities’’ available to an agent, given their capabilities and the environment by Gibson [1]. In this sense, a hammer, for example, usually has two different affordances: one affords pounding and the other affords grasping. For humans, while interacting with the real world, we focus on understanding different functions of objects to fulfill a certain action. Similarly, for an autonomous robot collaborating with humans, it is of great significance to understand object affordances to achieve a humanlike object manipulation. Imagine that we let a robot to use a hammer to pound something. What computer vision has achieved now allows the robot to recognize the hammer and localize it very accurately. However, to

& Yang Cao [email protected] 1

Department of Automation, University of Science and Technology of China, Hefei, China

further finish the specific task, the robot needs to know which part of the hammer can be grasped and which part can be used to pound. The problem of perceiving affordances at pixel level has been termed ‘‘object part labelling’’ in the computer vision community, while it is more commonly known as ‘‘affordance detection’’ in robo