3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a m
- PDF / 2,986,070 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 95 Downloads / 245 Views
Abstract. Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data [13]. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework (i) outperforms the stateof-the-art methods for single view reconstruction, and (ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Keywords: Multi-view
1
· Reconstruction · Recurrent neural network
Introduction
Rapid and automatic 3D object prototyping has become a game-changing innovation in many applications related to e-commerce, visualization, and architecture, to name a few. This trend has been boosted now that 3D printing is a democratized technology and 3D acquisition methods are accurate and efficient [15]. Moreover, the trend is also coupled with the diffusion of large scale repositories of 3D object models such as ShapeNet [13]. Most of the state-of-the-art methods for 3D object reconstruction, however, are subject to a number of restrictions. Some restrictions are that: (i) objects must be observed from a dense number of views; or equivalently, views must have a relatively small baseline. This is an issue when users wish to reconstruct D. Xu and J. Gwak—Equal contribution. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46484-8 38) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 628–644, 2016. DOI: 10.1007/978-3-319-46484-8 38
3D-R2N2
629
the object from just a handful of views or ideally just one view (see Fig. 1(a)); (ii) objects’ appearances (or their reflectance functions) are expected to be Lambertian (i.e. non-reflective) and the albedos are supposed be non-uniform (i.e., rich of non-homogeneous textures). These restrictions stem from a number of key technical assumptions. One typical assumption is that features can be matched across views [4,18,21,35] as hypothesized by the majority of the methods based on SFM or SLAM [22,24]. It has been demonstrated (for instance see [37]) that if the viewpoints are separated by a large baseline, establishing (traditional) feature correspondences is extremely problematic due to local appearance changes or self-occlusions. Moreover, lack of texture on objects and specular reflections also make the feature matching problem very difficult [9,43]. In order to
Data Loading...