Inferring 3D Shapes from Image Collections Using Adversarial Networks

  • PDF / 5,406,248 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 102 Downloads / 239 Views

DOWNLOAD

REPORT


Inferring 3D Shapes from Image Collections Using Adversarial Networks Matheus Gadelha1

· Aartika Rai1 · Subhransu Maji1 · Rui Wang1

Received: 16 May 2019 / Accepted: 25 April 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We investigate the problem of learning a probabilistic distribution over three-dimensional shapes given two-dimensional views of multiple objects taken from unknown viewpoints. Our approach called projective generative adversarial network (PrGAN) trains a deep generative model of 3D shapes whose projections (or renderings) matches the distribution of the provided 2D views. The addition of a differentiable projection module allows us to infer the underlying 3D shape distribution without access to any explicit 3D or viewpoint annotation during the learning phase. We show that our approach produces 3D shapes of comparable quality to GANs trained directly on 3D data. Experiments also show that the disentangled representation of 2D shapes into geometry and viewpoint leads to a good generative model of 2D shapes. The key advantage of our model is that it estimates 3D shape, viewpoint, and generates novel views from an input image in a completely unsupervised manner. We further investigate how the generative models can be improved if additional information such as depth, viewpoint or part segmentations is available at training time. To this end, we present new differentiable projection operators that can be used to learn better 3D generative models. Our experiments show that PrGAN can successfully leverage extra visual cues to create more diverse and accurate shapes. Keywords 3D generative models · Unsupervised learning · Differentiable rendering · Adversarial networks

1 Introduction The ability to infer 3D shapes of objects from their 2D views is one of the central challenges in computer vision. For example, when presented with a catalogue of airplane silhouettes as shown in the top of Fig. 1, one can mentally infer their 3D shapes by simultaneously reasoning about the shape and viewpoint variability. In this work, we investigate the probCommunicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, MingYu Liu, Jan Kautz, Antonio Torralba.

B

Matheus Gadelha [email protected] Aartika Rai [email protected] Subhransu Maji [email protected] Rui Wang [email protected]

1

College of Information and Computer Sciences, University of Massachusetts Amherst, 140 Governors Dr, Amherst, MA 01003, USA

lem of learning a generative model of 3D shapes from a collection of images of an unknown set of objects within a category taken from an unknown set of views. The images can be thought of as generalized projections of 3D shapes into a 2D space in the form of silhouettes, depth maps, or even part segmentations. The problem is challenging as one is not provided with the information about which object instance was used to generate each image, the viewpoint from which each image was taken, the parameterization of the underlying shape distribution, or even the number of un