DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

PDF / 15,218,520 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
96 Downloads / 287 Views

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations Hsin-Ying Lee1 · Hung-Yu Tseng1 · Qi Mao2 · Jia-Bin Huang3 · Yu-Ding Lu1 · Maneesh Singh4 · Ming-Hsuan Yang1 Received: 26 April 2019 / Accepted: 15 December 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for this task: (1) lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images. To synthesize diverse outputs, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and attribute vectors sampled from the attribute space to synthesize diverse outputs at test time. To handle unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative evaluations, we measure realism with user study and Fréchet inception distance, and measure diversity with the perceptual distance metric, Jensen–Shannon divergence, and number of statistically-different bins.

1 Introduction Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba. Hsin-Ying Lee, Hung-Yu Tseng and Qi Mao have contributed equally to this work.

B

Ming-Hsuan Yang [email protected] Hsin-Ying Lee [email protected] Hung-Yu Tseng [email protected] Qi Mao [email protected] Jia-Bin Huang [email protected] Yu-Ding Lu [email protected] Maneesh Singh [email protected]

1

2

Image-to-Image (I2I) translation aims to learn the mapping between different visual domains. Numerous vision and graphics problems can be formulated as I2I translation problems, such as colorization (Larsson et al. 2016; Zhang et al. 2016) (grayscale → color), super-resolution (Lai et al. 2017; Ledig et al. 2017; Li et al. 2016, 2019) (low-resolution → high-resolution), and photorealistic image synthesis (Chen and Koltun 2017; Park et al. 2019; Wang et al. 2018) (label → image). In addition, I2I translation can be applied to synthesize images for domain adaptation (Bousmalis et al. 2017; Chen et al. 2019; Hoffman et al. 2018; Murez et al. 2018; Shrivastava et al. 2017). Learning the mapping between two visual domains is challenging for two main reasons. First, aligned training image pairs are either difficult to collect (e.g., day scene ↔ night scene) or do not exist (e.g., artwork ↔ real photo). Second, many such mappings are inherently multimodal—a single input may correspond to multiple possible outputs. To handle multimodal translation, one possible approach is to inject

Electrical Engineering

Data Loading...

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Recommend Documents

Learning Disentangled Representations via Mutual Information Estimation

Semi-supervised Pathology Segmentation with Disentangled Representations

Fairness by Learning Orthogonal Disentangled Representations

Face Anti-Spoofing via Disentangled Representation Learning

Learning Disentangled Representations with Attentive Joint Variational Autoencoder

iCaps: An Interpretable Classifier via Disentangled Capsule Networks

AssembleNet++: Assembling Modality Representations via Attention Connections

Obtaining Better Word Representations via Language Transfer

Disentangled Non-local Neural Networks

Diverse

Diverse

The Curious Robot: Learning Visual Representations via Physical Interactions