Guest Editorial: Generative Adversarial Networks for Computer Vision

  • PDF / 180,324 Bytes
  • 3 Pages / 595.276 x 790.866 pts Page_size
  • 54 Downloads / 180 Views

DOWNLOAD

REPORT


Guest Editorial: Generative Adversarial Networks for Computer Vision Jun-Yan Zhu1 · Hongsheng Li2 · Eli Shechtman1 · Ming-Yu Liu3 · Jan Kautz3 · Antonio Torralba4 Received: 3 September 2020 / Published online: 14 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

For decades, the field of computer vision has used generative models for both recognition and synthesis tasks. For example, seminal work adopted probabilistic generative models for image classification (Weber et al. 2000; Fergus et al. 2003; Fei-Fei and Perona 2005), shape perception (Freeman 1994) and digit recognition (Revow et al. 1996; Learned-Miller 2005). Meantime, classic generative models, such as Gaussian Mixture Model and principal components, have long been used to learn prior models for image restoration (Olshausen and Field 1996; Portilla and Simoncelli 2000; Zoran and Weiss 2011), segmentation (Rother et al. 2004), and face modeling (Blanz and Vetter 1999; Cootes et al. 2001). Unfortunately, due to the limited capacity, these models either learn local image statistics of pixel values, gradients, and feature descriptors, or only work well on aligned objects such as digits and faces. None of the above models are able to learn the distribution of in-the-wild natural images and capture long-range dependence beyond local regions. As a result, the above work mostly focused on low-level vision and graphics applications. For recognition tasks, classic generative models seldom outperformed discriminative classifiers. For synthesis tasks, these methods

B

Jun-Yan Zhu [email protected] Hongsheng Li [email protected] Eli Shechtman [email protected] Ming-Yu Liu [email protected] Jan Kautz [email protected] Antonio Torralba [email protected]

1

Adobe Research, San Jose, CA, USA

2

The Chinese University of Hong Kong, Hong Kong SAR, China

3

NVIDIA Research, Santa Clara, CA, USA

4

MIT CSAIL, Cambridge, MA, USA

struggled to synthesize natural images with the same expressiveness and fidelity as 3D graphics rendering pipelines, with the notable exception of photorealistic face synthesis with Morphable Models (Blanz and Vetter 1999) and Active Appearance Models (Cootes et al. 2001). Recently, a wide range of deep generative models (Hinton and Salakhutdinov 2006; Goodfellow et al. 2014; Kingma and Welling 2014; Dinh et al. 2016; Van den Oord et al. 2016) have been developed for modeling the distribution of full images. Among them, Generative Adversarial Networks (GANs) (Goodfellow et al. 2014) have been at the forefront of research in the past few years, producing highquality images while enabling efficient inference. GANs can approximate real data distributions and synthesize realistic data samples. The learning algorithm is carried through a two-player game between a generator that synthesizes an image, and a discriminator that distinguishes real images from synthetic ones. Compared to prior work (Tu 2007), the success of GANs partly comes from high-capacity CNN classifiers (Krizhevsky e al. 2012) that can easily