Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

PDF / 5,870,472 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
53 Downloads / 292 Views

Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images Haozhe Xie1,2,4

· Hongxun Yao1,2

· Shengping Zhang3,7

· Shangchen Zhou6

· Wenxiu Sun5

Received: 24 December 2019 / Accepted: 12 June 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Recovering the 3D shape of an object from single or multiple images with deep neural networks has been attracting increasing attention in the past few years. Mainstream works (e.g. 3D-R2N2) use recurrent neural networks (RNNs) to sequentially fuse feature maps of input images. However, RNN-based approaches are unable to produce consistent reconstruction results when given the same input images with different orders. Moreover, RNNs may forget important features from early input images due to long-term memory loss. To address these issues, we propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume. To further correct the wrongly recovered parts in the fused 3D volume, a refiner is adopted to generate the final output. Experimental results on the ShapeNet, Pix3D, and Things3D benchmarks show that Pix2Vox++ performs favorably against state-of-the-art methods in terms of both accuracy and efficiency. Keywords 3D object reconstruction · Multi-scale · Context-aware · Convolutional neural network

1 Introduction Communicated by Thomas Brox.

B

Hongxun Yao [email protected] Haozhe Xie [email protected] Shengping Zhang [email protected] Shangchen Zhou [email protected] Wenxiu Sun [email protected]

1

State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China

2

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

3

School of Computer Science and Technology, Harbin Institute of Technology, Weihai, China

4

SenseTime Research, Shenzhen, China

5

SenseTime Research, Hong Kong SAR, China

6

Nanyang Technological University, Singapore, Singapore

7

Peng Cheng Laboratory, Shenzhen, China

Inferring the complete and precise 3D shape of an object is essential in robotics, 3D modeling and animation, object recognition, and medical diagnosis. Traditional methods, such as Structure from Motion (SfM) (Özyeil et al. 2017) and Simultaneous Localization and Mapping (SLAM) (FuentesPacheco et al. 2015), match features across images captured from slightly different views, and then use the triangulation principle to recover 3D coordinates of the image pixels. Although these methods can produce 3D reconstruction with satisfactory quality, they typically capture multiple images of the same object using well-calibrated cameras, which is not practical or feasible in some situations (Yang et al. 2019). Recently, several deep learni

Data Loading...

Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

Recommend Documents

3D Scene Reconstruction from a Single Viewport

Icons, 3D Images, and Object Grouping

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

SyB3R: A Realistic Synthetic Benchmark for 3D Reconstruction from Images

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image

Eyeglasses 3D Shape Reconstruction from a Single Face Image

Object Reconstruction

3D-Reconstruction and Semantic Segmentation of Cystoscopic Images

Single image 3D object reconstruction based on deep learning: A review

SDF-2-SDF: Highly Accurate 3D Object Reconstruction