Layout2image: Image Generation from Layout

  • PDF / 25,853,838 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 23 Downloads / 279 Views

DOWNLOAD

REPORT


Layout2image: Image Generation from Layout Bo Zhao1,2,3

· Weidong Yin1,3 · Lili Meng1 · Leonid Sigal1,3

Received: 14 April 2019 / Accepted: 2 February 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Despite significant recent progress on generative models, controlled generation of images depicting multiple and complex object layouts is still a difficult problem. Among the core challenges are the diversity of appearance a given object may possess and, as a result, exponential set of images consistent with a specified layout. To address these challenges, we propose a novel approach for layout-based image generation; we call it Layout2Im. Given the coarse spatial layout (bounding boxes + object categories), our model can generate a set of realistic images which have the correct objects in the desired locations. The representation of each object is disentangled into a specified/certain part (category) and an unspecified/uncertain part (appearance). The category is encoded using a word embedding and the appearance is distilled into a low-dimensional vector sampled from a normal distribution. Individual object representations are composed together using convolutional LSTM, to obtain an encoding of the complete layout, and then decoded to an image. Several loss terms are introduced to encourage accurate and diverse image generation. The proposed Layout2Im model significantly outperforms the previous state-of-the-art, boosting the best reported inception score by 24.66% and 28.57% on the very challenging COCO-Stuff and Visual Genome datasets, respectively. Extensive experiments also demonstrate our model’s ability to generate complex and diverse images with many objects. Keywords Scene image generation · Image translation · Image generation · Generative adversarial networks

1 Introduction Image generation of complex realistic scenes with multiple objects and desired layouts is one of the core frontiers for computer vision. Existence of such algorithms would not Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba. Bo Zhao and Weidong Yin have contributed equally to this work.

B

Bo Zhao [email protected] Weidong Yin [email protected] Lili Meng [email protected] Leonid Sigal [email protected]

1

Department of Computer Science, The University of British Columbia, Vancouver, Canada

2

Bank of Montreal AI, Toronto, Canada

3

Vector Institute, Toronto, Canada

only inform our designs for inference mechanisms, needed for visual understanding, but also provide practical application benefits in terms of automatic image generation for artists and users. In fact, such algorithms, if successful, may replace visual search and retrieval engines in their entirety. Why search the web for an image, if you can create one to user’s specification? For these reasons, image generation algorithms have been a major focus of recent research. Of specific relevance are approaches for text-to-image (Hong et al. 2018; Karacan et al. 2016; Mansimov et al. 2015; Re