Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

As 3D movie viewing becomes mainstream and the Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly. Producing 3D videos, however, remains challenging. In this paper we propose to use deep neural networks to automatically con

  • PDF / 1,919,815 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 70 Downloads / 170 Views

DOWNLOAD

REPORT


2

University of Washington, Seattle, USA {jxie,rbg,ali}@cs.washington.edu Allen Institute for Artificial Intelligence, Seattle, USA

Abstract. As 3D movie viewing becomes mainstream and the Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly. Producing 3D videos, however, remains challenging. In this paper we propose to use deep neural networks to automatically convert 2D videos and images to a stereoscopic 3D format. In contrast to previous automatic 2D-to-3D conversion algorithms, which have separate stages and need ground truth depth map as supervision, our approach is trained endto-end directly on stereo pairs extracted from existing 3D movies. This novel training scheme makes it possible to exploit orders of magnitude more data and significantly increases performance. Indeed, Deep3D outperforms baselines in both quantitative and human subject evaluations.

Keywords: Monocular stereo reconstruction neural networks

1

·

Deep convolutional

Introduction

3D movies are popular and comprise a large segment of the movie theater market, ranging between 14 % and 21 % of all box office sales between 2010 and 2014 in the U.S. and Canada [1]. Moreover, the emerging market of Virtual Reality (VR) head-mounted displays will likely drive an increased demand for 3D content (Fig. 1). 3D videos and images are usually stored in stereoscopic format. For each frame, the format includes two projections of the same scene, one of which is exposed to the viewer’s left eye and the other to the viewer’s right eye, thus giving the viewer the experience of seeing the scene in three dimensions. There are two approaches to making 3D movies: shooting natively in 3D or converting to 3D after shooting in 2D. Shooting in 3D requires costly specialpurpose stereo camera rigs. Aside from equipment costs, there are cinemagraphic Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 51) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 842–857, 2016. DOI: 10.1007/978-3-319-46493-0 51

Deep3D: Automatic 2D-to-3D Video Conversion with DCNNs

843

Fig. 1. We propose Deep3D, a fully automatic 2D-to-3D conversion algorithm that takes 2D images or video frames as input and outputs stereo 3D image pairs. The stereo images can be viewed with 3D glasses or head-mounted VR displays. Deep3D is trained directly on stereo pairs from a dataset of 3D movies to minimize the pixel-wise reconstruction error of the right view when given the left view. Internally, the Deep3D network estimates a probabilistic disparity map that is used by a differentiable depth image-based rendering layer to produce the right view. Thus Deep3D does not require collecting depth sensor data for supervision.

issues that may preclude the use of stereo camera rigs. For example, some inexpensive optical special effects, such as forced perspective1 , are not compatible with multi-view captur