SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters

  • PDF / 27,973,817 Bytes
  • 22 Pages / 595.276 x 790.866 pts Page_size
  • 65 Downloads / 240 Views

DOWNLOAD

REPORT


SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters Evangelos Ververas1

· Stefanos Zafeiriou1

Received: 15 May 2019 / Accepted: 10 May 2020 © The Author(s) 2020

Abstract Image-to-image (i2i) translation is the dense regression problem of learning how to transform an input image into an output using aligned image pairs. Remarkable progress has been made in i2i translation with the advent of deep convolutional neural networks and particular using the learning paradigm of generative adversarial networks (GANs). In the absence of paired images, i2i translation is tackled with one or multiple domain transformations (i.e., CycleGAN, StarGAN etc.). In this paper, we study the problem of image-to-image translation, under a set of continuous parameters that correspond to a model describing a physical process. In particular, we propose the SliderGAN which transforms an input face image into a new one according to the continuous values of a statistical blendshape model of facial motion. We show that it is possible to edit a facial image according to expression and speech blendshapes, using sliders that control the continuous values of the blendshape model. This provides much more flexibility in various tasks, including but not limited to face editing, expression transfer and face neutralisation, comparing to models based on discrete expressions or action units. Keywords GAN · Image translation · Facial expression synthesis · Speech synthesis · Blendshape models · Action units · 3DMM fitting · Relativistic discriminator · Emotionet · 4DFAB · LRW

1 Introduction Interactive editing of the expression of a face in an image has countless applications including but not limited to movies post-production, computational photography, face recognition (i.e. expression neutralisation) etc. In computer graphics facial motion editing is a popular field, nevertheless mainly revolves around constructing person-specific models having a lot of training samples (Suwajanakorn et al. 2017). Recently, the advent of machine learning, and especially Deep Convolutional Neural Networks (DCNNs) provide very exciting tools making the community to re-think the problem. In particular, recent advances in Generative Adver-

Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba.

B

Evangelos Ververas [email protected] Stefanos Zafeiriou [email protected]

1

Department of Computing, Imperial College London, Queens Gate, London SW7 2AZ, UK

sarial Networks (GANs) provide very exciting solutions for image-to-image (i2i) translation. i2i translation, i.e. the problem of learning how to transform aligned image pairs, has attracted a lot of attention during the last few years (Isola et al. 2017; Zhu et al. 2017; Choi et al. 2018). The so-called pix2pix model and alternatives demonstrated excellent results in image completion etc. (Isola et al. 2017). In order to perform i2i translation in absence of image pairs the so-called CycleGAN was proposed, which introduced a