Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields

We present a method for jointly predicting a depth map and intrinsic images from single-image input. The two tasks are formulated in a synergistic manner through a joint conditional random field (CRF) that is solved using a novel convolutional neural netw

  • PDF / 3,927,331 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 30 Downloads / 211 Views

DOWNLOAD

REPORT


Yonsei University, Seoul, South Korea {srkim89,khpark7727,khsohn}@yonsei.ac.kr 2 Microsoft Research, Redmond, USA [email protected]

Abstract. We present a method for jointly predicting a depth map and intrinsic images from single-image input. The two tasks are formulated in a synergistic manner through a joint conditional random field (CRF) that is solved using a novel convolutional neural network (CNN) architecture, called the joint convolutional neural field (JCNF) model. Tailored to our joint estimation problem, JCNF differs from previous CNNs in its sharing of convolutional activations and layers between networks for each task, its inference in the gradient domain where there exists greater correlation between depth and intrinsic images, and the incorporation of a gradient scale network that learns the confidence of estimated gradients in order to effectively balance them in the solution. This approach is shown to surpass state-of-the-art methods both on single-image depth estimation and on intrinsic image decomposition. Keywords: Single-image depth estimation · Intrinsic image decomposition · Conditional random field · Convolutional neural networks

1

Introduction

Perceiving the physical properties of a scene undoubtedly plays a fundamental role in understanding real-world imagery. Such inherent properties include the 3-D geometric configuration, the illumination or shading, and the reflectance or albedo of each scene surface. Depth prediction and intrinsic image decomposition, which aims to recover shading and albedo, are thus two fundamental yet challenging tasks in computer vision. While they address different aspects of scene understanding, there exist strong consistencies among depth and intrinsic S. Kim—This work was done while Seungryong Kim was an intern at Microsoft Research. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46484-8 9) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 143–159, 2016. DOI: 10.1007/978-3-319-46484-8 9

144

S. Kim et al.

images, such that information about one provides valuable prior knowledge for recovering the other. In the intrinsic image decomposition literature, several works have exploited measured depth information to make the decomposition problem more tractable [1–5]. These techniques have all demonstrated better performance than using RGB images alone. On the other hand, in the literature for single-image depth prediction, illumination-invariant features have been utilized for greater robustness in depth inference [6,7], and shading discontinuities have been used to detect surface boundaries [8], suggesting that intrinsic images can be employed to enhance depth prediction performance. Although the two tasks are mutually beneficial, most previous research have solved for them only in sequence, by using estimated intrinsic images to constrain depth prediction [8], or vice versa [9]. We propose