DRIT++: Diverse Image-to-Image Translation via Disentangled Representations
- PDF / 15,218,520 Bytes
- 16 Pages / 595.276 x 790.866 pts Page_size
- 96 Downloads / 252 Views
DRIT++: Diverse Image-to-Image Translation via Disentangled Representations Hsin-Ying Lee1 · Hung-Yu Tseng1 · Qi Mao2 · Jia-Bin Huang3 · Yu-Ding Lu1 · Maneesh Singh4 · Ming-Hsuan Yang1 Received: 26 April 2019 / Accepted: 15 December 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for this task: (1) lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images. To synthesize diverse outputs, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and attribute vectors sampled from the attribute space to synthesize diverse outputs at test time. To handle unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative evaluations, we measure realism with user study and Fréchet inception distance, and measure diversity with the perceptual distance metric, Jensen–Shannon divergence, and number of statistically-different bins.
1 Introduction Communicated by Jun-Yan Zhu, Hongsheng Li, Eli Shechtman, Ming-Yu Liu, Jan Kautz, Antonio Torralba. Hsin-Ying Lee, Hung-Yu Tseng and Qi Mao have contributed equally to this work.
B
Ming-Hsuan Yang [email protected] Hsin-Ying Lee [email protected] Hung-Yu Tseng [email protected] Qi Mao [email protected] Jia-Bin Huang [email protected] Yu-Ding Lu [email protected] Maneesh Singh [email protected]
1
2
Image-to-Image (I2I) translation aims to learn the mapping between different visual domains. Numerous vision and graphics problems can be formulated as I2I translation problems, such as colorization (Larsson et al. 2016; Zhang et al. 2016) (grayscale → color), super-resolution (Lai et al. 2017; Ledig et al. 2017; Li et al. 2016, 2019) (low-resolution → high-resolution), and photorealistic image synthesis (Chen and Koltun 2017; Park et al. 2019; Wang et al. 2018) (label → image). In addition, I2I translation can be applied to synthesize images for domain adaptation (Bousmalis et al. 2017; Chen et al. 2019; Hoffman et al. 2018; Murez et al. 2018; Shrivastava et al. 2017). Learning the mapping between two visual domains is challenging for two main reasons. First, aligned training image pairs are either difficult to collect (e.g., day scene ↔ night scene) or do not exist (e.g., artwork ↔ real photo). Second, many such mappings are inherently multimodal—a single input may correspond to multiple possible outputs. To handle multimodal translation, one possible approach is to inject
Electrical Engineering
Data Loading...