Multi-view 3D Models from Single Images with a Convolutional Network

We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object. Concretely, the network can predict an RGB image and a depth map of the object as seen from an arbitrary view. S

PDF / 4,320,115 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
108 Downloads / 306 Views

DOWNLOAD

REPORT

Abstract. We present a convolutional network capable of inferring a 3D representation of a previously unseen object given a single image of this object. Concretely, the network can predict an RGB image and a depth map of the object as seen from an arbitrary view. Several of these depth maps fused together give a full point cloud of the object. The point cloud can in turn be transformed into a surface mesh. The network is trained on renderings of synthetic 3D models of cars and chairs. It successfully deals with objects on cluttered background and generates reasonable predictions for real images of cars. Keywords: 3D from single image works

1

· Deep learning · Convolutional net-

Introduction

The ability to infer a 3D model of an object from a single image is necessary for human-level scene understanding. Despite the large success of deep learning in computer vision and the diversity of tasks being approached, 3D representations are not yet in the focus of deep networks. Can we make deep networks learn such 3D representations? In this paper, we present a simple and elegant encoder-decoder network that infers a 3D model of an object from a single image of this object, see Fig. 1. We represent the object by what we call “multi-view 3D model” – the set of all its views and corresponding depth maps. Given an arbitrary viewpoint, the network we propose generates an RGB image of the object and the depth map. This representation contains rich information about the 3D geometry of the object, but allows for more eﬃcient implementation than voxel-based 3D models. By fusing several views from our multi-view representation we get a full 3D point cloud of the object, including parts invisible in the original input image. While technically the task comes with many ambiguities, humans are known to be good in using their prior knowledge about similar objects to guess the missing information. The same is achieved by the proposed network: when the Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46478-7 20) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 322–337, 2016. DOI: 10.1007/978-3-319-46478-7 20

Multi-view 3D Models from Single Images with a Convolutional Network

323

Fig. 1. Our network infers an object’s 3D representation from a single input image. It then predicts unseen views of this object and their depth maps. Multiple such views are fused into a full 3D point cloud, which is further optimized to obtain a mesh.

input image does not allow the network to infer the parts of an object – for example, because the input only shows the front view of a car and there is no information about its back – it fantasizes the most probable shape consistent with the presented data (for example, a standard sedan car). The network is trained end-to-end on renderings of 3D models from the ShapeNet dataset [1]. We render images on the ﬂy during network training, w

Data Loading...

Multi-view 3D Models from Single Images with a Convolutional Network

Recommend Documents

Aneurysm Identification in Cerebral Models with Multiview Convolutional Neural Network

From Multiview Image Curves to 3D Drawings

Brain age estimation based on 3D MRI images using 3D convolutional neural network

Nodule Generation of Lung CT Images Using a 3D Convolutional LSTM Network

Layer-based sparse representation of multiview images

Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images

Single Image 3D Interpreter Network

Convolutional Neural Network with Fourier Transform for Road Classification from Satellite Images

Evaluating severity of white matter lesions from computed tomography images with convolutional neural network

Breast Cancer Classification from Histopathological Images with Separable Convolutional Neural Network and Parametric Re

End-To-End Convolutional Neural Network for 3D Reconstruction of Knee Bones from Bi-planar X-Ray Images

Learning-detailed 3D face reconstruction based on convolutional neural networks from a single image