Attribute2Image: Conditional Image Generation from Visual Attributes
This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end us
- PDF / 3,345,269 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 84 Downloads / 231 Views
Computer Science and Engineering, University of Michigan, Ann Arbor, USA {xcyan,honglak}@umich.edu 2 Adobe Research, San Francisco, USA [email protected] 3 NEC Labs, Cupertino, USA [email protected]
Abstract. This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attributeconditioned image reconstruction and completion.
1
Introduction
Generative image modeling is of fundamental interest in computer vision and machine learning. Early works [20,21,26,30,32,36] studied statistical and physical principles of building generative models, but due to the lack of effective feature representations, their results are limited to textures or particular patterns such as well-aligned faces. Recent advances on representation learning using deep neural networks [16,29] nourish a series of deep generative models that enjoy joint generative modeling and representation learning through Bayesian inference [1,9,14,15,28,34] or adversarial training [3,8]. Those works show promising results of generating natural images, but the generated samples are still in low resolution and far from being perfect because of the fundamental challenges of learning unconditioned generative models of images. In this paper, we are interested in generating object images from high-level description. For example, we would like to generate portrait images that all match the description “a young girl with brown hair is smiling” (Fig. 1). This Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 47) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 776–791, 2016. DOI: 10.1007/978-3-319-46493-0 47
Attribute2Image: Conditional Image Generation from Visual Attributes
777
Fig. 1. An example that demonstrates the problem of conditioned image generation from visual attributes. We assume a vector of visual attributes is extracted from a natural language description, and then this attribute vector is combined with learned latent factors to generate diverse image samples. (Color figure online)
conditioned treatment reduces sampling uncertainties and helps generating more realistic images, and thus has potential real-world applications such as forensic art and semantic photo editing [12,19,40]. The high-level descriptions are usually natural languages, but what unde
Data Loading...