Conditional Image Synthesis Using Stacked Auxiliary Classifier Generative Adversarial Networks

Synthesizing photo-realistic images has been a long-standing challenge in image processing and could provide crucial approaches for dataset augmentation and balancing. Traditional methods have trouble in dealing with the rich and complicated structural in

  • PDF / 2,069,061 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 26 Downloads / 231 Views

DOWNLOAD

REPORT


Department of Computing, Imperial College London, London, UK [email protected] 2 Data Science Institute, Imperial College London, London, UK {hao.dong11,fangde.liu,y.guo}@imperial.ac.uk

Abstract. Synthesizing photo-realistic images has been a long-standing challenge in image processing and could provide crucial approaches for dataset augmentation and balancing. Traditional methods have trouble in dealing with the rich and complicated structural information of objects resulting from the variations in colors, poses, textures and illumination. Recent advancement in Deep Learning techniques presents a new perspective to this task. The aim of our paper is to apply state-of-theart generative models to synthesize diverse and realistic high-resolution images. Extensive experiments have been conducted on celebA dataset, a large-scale face attributes dataset with more than 200 thousand celebrity images, each with 40 attribute labels. Enlightened by existing structures, we present stacked Auxiliary Classifier Generative Adversarial Networks (Stack-ACGAN) for image synthesis given conditioning labels, which generates low resolution images (e.g. 64 × 64) that sketch basic shapes and colors in Stage-I and high resolution images (e.g. 256 × 256) with plausible details in Stage-II. Inception scores and Multi-Scale Structural Similarity (MS-SSIM) are computed for evaluation of the synthesized images. Both quantitative and qualitative analysis prove the proposed model is capable of generating diverse and realistic images.

Keywords: High-resolution image synthesis Generative adversarial networks

1

· Deep learning

Introduction

Generating photo-realistic images in high resolution is a challenging task, which has enormous applications in scenarios including datasets augmentation and computer-aided design and manufacturing, etc. However, even the most advanced generative models fail to generate plausible images with conditional information, especially in high resolution, due to the fact that objective data space is multi-modal. In other words, there are many possible images that correctly match specific conditional labels. c Springer Nature Switzerland AG 2019  K. Arai et al. (Eds.): FICC 2018, AISC 887, pp. 423–433, 2019. https://doi.org/10.1007/978-3-030-03405-4_29

424

Z. Yao et al.

Recently, Generative Adversarial Network (GAN) [1] attracts great attention in the field of image synthesis and a large number of GAN variants ([2]) are proposed to be proficient of generating shaper images. However, due to the unstable training of GAN, most existing GAN networks generate relatively low-resolution images (e.g. 64 × 64) and the details and object parts added by super-resolution approaches are limited so that large detects in the low-resolution images can hardly be rectified. Therefore, synthesizing high-resolution images with photorealistic details remain to be a pending challenge. To solve this problem, we propose Stacked Auxiliary Classifier Generative Adversarial Network (Stack-ACGAN) which divides the synthesis process into tw

Data Loading...