Attribute2Image: Conditional Image Generation from Visual Attributes

This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end us

PDF / 3,345,269 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
84 Downloads / 259 Views

DOWNLOAD

REPORT

Computer Science and Engineering, University of Michigan, Ann Arbor, USA {xcyan,honglak}@umich.edu 2 Adobe Research, San Francisco, USA [email protected] 3 NEC Labs, Cupertino, USA [email protected]

Abstract. This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attributeconditioned image reconstruction and completion.

1

Introduction

Generative image modeling is of fundamental interest in computer vision and machine learning. Early works [20,21,26,30,32,36] studied statistical and physical principles of building generative models, but due to the lack of eﬀective feature representations, their results are limited to textures or particular patterns such as well-aligned faces. Recent advances on representation learning using deep neural networks [16,29] nourish a series of deep generative models that enjoy joint generative modeling and representation learning through Bayesian inference [1,9,14,15,28,34] or adversarial training [3,8]. Those works show promising results of generating natural images, but the generated samples are still in low resolution and far from being perfect because of the fundamental challenges of learning unconditioned generative models of images. In this paper, we are interested in generating object images from high-level description. For example, we would like to generate portrait images that all match the description “a young girl with brown hair is smiling” (Fig. 1). This Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 47) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 776–791, 2016. DOI: 10.1007/978-3-319-46493-0 47

Attribute2Image: Conditional Image Generation from Visual Attributes

777

Fig. 1. An example that demonstrates the problem of conditioned image generation from visual attributes. We assume a vector of visual attributes is extracted from a natural language description, and then this attribute vector is combined with learned latent factors to generate diverse image samples. (Color ﬁgure online)

conditioned treatment reduces sampling uncertainties and helps generating more realistic images, and thus has potential real-world applications such as forensic art and semantic photo editing [12,19,40]. The high-level descriptions are usually natural languages, but what unde

Data Loading...

Attribute2Image: Conditional Image Generation from Visual Attributes

Recommend Documents

Visual-Relation Conscious Image Generation from Structured-Text

Layout2image: Image Generation from Layout

What Visual Attributes Characterize an Object Class?

Imaging Parameters and Image Attributes

Visual Analytics for Understanding Multiple Attributes

Unique Stego Key Generation from Fingerprint Image in Image Steganography

Image Generation

Multi-period Infrared Image Generation Based on Multi-conditional Cycle Generative Adversarial Networks

OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Cl

Compositional GAN: Learning Image-Conditional Binary Composition

Zero-shot recognition with latent visual attributes learning

Visual Image in His Brain