DOC: Deep OCclusion Estimation from a Single Image

In this paper, we propose a deep convolutional network architecture, called DOC, which detects object boundaries and estimates the occlusion relationships (i.e. which side of the boundary is foreground and which is background). Specifically, we first repr

PDF / 4,077,002 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
52 Downloads / 238 Views

DOWNLOAD

REPORT

University of California, Los Angeles, USA [email protected] John Hopkins University, Baltimore, USA [email protected]

Abstract. In this paper, we propose a deep convolutional network architecture, called DOC, which detects object boundaries and estimates the occlusion relationships (i.e. which side of the boundary is foreground and which is background). Speciﬁcally, we ﬁrst represent occlusion relations by a binary edge indicator, to indicate the object boundary, and an occlusion orientation variable whose direction speciﬁes the occlusion relationships by a left-hand rule, see Fig. 1. Then, our DOC networks exploit local and non-local image cues to learn and estimate this representation and hence recover occlusion relations. To train and test DOC, we construct a large-scale instance occlusion boundary dataset using PASCAL VOC images, which we call the PASCAL instance occlusion dataset (PIOD). It contains 10,000 images and hence is two orders of magnitude larger than existing occlusion datasets for outdoor images. We test two variants of DOC on PIOD and on the BSDS ownership dataset and show they outperform state-of-the-art methods typically by more than 5AP. Finally, we perform numerous experiments investigating multiple settings of DOC and transfer between BSDS and PIOD, which provides more insights for further study of occlusion estimation.

1

Introduction

Humans are able to recover the occlusion relationships of objects from single images. This has long been recognized as an important ability for scene understanding and perception [4,15]. As shown on the left of Fig. 1, we can use occlusion relationships to deduce that the person is holding a dog, because the person’s hand occludes the dog and the dog occludes the person’s body. Electrophysiological [18] and fMRI [13] studies suggest that occlusion relationships are detected as early as visual area V2. Biological studies [9] also suggest that occlusion detection can require feedback from higher level cortical regions, indicating that long-range context and semantic-level knowledge may be needed. Psychophysical studies show that there are many cues for occlusion including edge convexity [23], edge-junctions, intensity gradients, and texture [35]. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46448-0 33) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 545–561, 2016. DOI: 10.1007/978-3-319-46448-0 33

546

P. Wang and A. Yuille

Fig. 1. Left: Occlusion boundaries represented by orientation θ (the red arrows), which indicates occlusion relationship using the “left” rule where the left side of the arrows is foreground. Right: More examples from our Pascal instance occlusion dataset (PIOD). (Color ﬁgure online)

Computer vision researchers have also used similar cues for estimating occlusion relations. A standard strategy is to apply machine learning techniques to combine cues li

Data Loading...

DOC: Deep OCclusion Estimation from a Single Image

Recommend Documents

Detection and Depth Estimation for Objects from Single Monocular Image

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Fine-Scale Surface Normal Estimation Using a Single NIR Image

Single-Shot Retinal Image Enhancement Using Deep Image Priors

WC2FEst-Net: Wavelet-Based Coarse-to-Fine Head Pose Estimation from a Single Image

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image

Metabolic rate estimation method using image deep learning

Iris Recognition with Deformation and Occlusion Estimation

CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image

Model-Based Occlusion Disentanglement for Image-to-Image Translation

Single-image rain removal using deep residual network

Semi-supervised Learning to Remove Fences from a Single Image