Generative Image Modeling Using Style and Structure Adversarial Networks

Current generative frameworks use end-to-end learning and generate images by sampling from uniform noise distribution. However, these approaches ignore the most basic principle of image formation: images are product of: (a) Structure: the underlying 3D mo

  • PDF / 4,870,371 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 17 Downloads / 235 Views

DOWNLOAD

REPORT


Abstract. Current generative frameworks use end-to-end learning and generate images by sampling from uniform noise distribution. However, these approaches ignore the most basic principle of image formation: images are product of: (a) Structure: the underlying 3D model; (b) Style: the texture mapped onto structure. In this paper, we factorize the image generation process and propose Style and Structure Generative Adversarial Network (S2 -GAN). Our S2 -GAN has two components: the StructureGAN generates a surface normal map; the Style-GAN takes the surface normal map as input and generates the 2D image. Apart from a real vs. generated loss function, we use an additional loss with computed surface normals from generated images. The two GANs are first trained independently, and then merged together via joint learning. We show our S2 -GAN model is interpretable, generates more realistic images and can be used to learn unsupervised RGBD representations.

1

Introduction

Unsupervised learning of visual representations is one of the most fundamental problems in computer vision. There are two common approaches for unsupervised learning: (a) using a discriminative framework with auxiliary tasks where supervision comes for free, such as context prediction [1,2] or temporal embedding [3–8]; (b) using a generative framework where the underlying model is compositional and attempts to generate realistic images [9–12]. The underlying hypothesis of the generative framework is that if the model is good enough to generate novel and realistic images, it should be a good representation for vision tasks as well. Most of these generative frameworks use end-to-end learning to generate RGB images from control parameters (z also called noise since it is sampled from a uniform distribution). Recently, some impressive results [13] have been shown on restrictive domains such as faces and bedrooms. However, these approaches ignore one of the most basic underlying principles of image formation. Images are a product of two separate phenomena: Structure: this encodes the underlying geometry of the scene. It refers to the underlying mesh, voxel representation etc. Style: this encodes the texture on the objects and the illumination. In this paper, we build upon this IM101 principle of image formation and factor the generative adversarial network (GAN) into two generative processes as Fig. 1. The first, a structure generative model (namely c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 318–335, 2016. DOI: 10.1007/978-3-319-46493-0 20

Generative Image Modeling using Style and Structure Adversarial Networks Output1: Surface Normal

Structure GAN Uniform Noise Distribution

Uniform Noise Distribution

319

Output2: Natural Indoor Scenes

Style GAN

(a) Generative Pipeline

(b) Generated Examples

(c) Synthetic Scenes Rendering

Fig. 1. (a) Generative Pipeline: Given zˆ sampled from uniform distribution, our Structure-GAN generates a surface normal map as output. This surface normal map is then gi

Data Loading...