Feature-attention module for context-aware image-to-image translation

  • PDF / 5,513,295 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 102 Downloads / 236 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Feature‑attention module for context‑aware image‑to‑image translation Jing Bai1,2   · Ran Chen1 · Min Liu3

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In a summer2winter image-to-image translation, trees should be transformed from green to gray, but the colors of houses or girls should not be changed. However, current unsupervised one-to-one image translation techniques failed to focus the translation on individual objects. To tackle this issue, we propose a novel feature-attention module for capturing the mutual influences of various features, so as to automatically attend only to specific scene objects in unsupervised image-to-image translation. The proposed module can be integrated into different image translation networks and improve their context-aware translation ability. The qualitative and quantitative experiments on horse2zebra, apple2orange and summer2winter datasets based on DualGAN, CycleGAN and UNIT demonstrate a significant improvement in our proposed module over the stateof-the-art methods. In addition, the experiments on apple2orange dataset based on MUNIT and DRIT further indicate the effectiveness of FA module in multimodal translation tasks. We also show that the computation complexity of the proposed module is linear to the image size; moreover, the experiments on the day2night dataset prove that the proposed module is insensitive to the growth of image resolution. The source code and trained models are available at https​://githu​b.com/gaoyu​ ainsh​uyi/fa. Keywords  Feature-attention · Unsupervised image-to-image translation · Context-aware

1 Introduction Image translation is a long-standing and challenging task in computer visions, and lots of problems can be viewed as the image translation, such as super-resolution [1–3], colorization [4], inpainting [5] and style transfer [6]. An ideal network for image translation should be context-aware, which should not only have the ability of finding what the differences between the source domain and the target domain are but also have the ability of judging which parts should be affected and which parts should be retained during translation process according to the context of translation tasks.

* Jing Bai [email protected] 1



School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China

2



Ningxia Province Key Laboratory of Intelligent Information and Data Processing, Yinchuan 750021, China

3

School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA



When networks are trained based on paired examples with supervised setting, the problems of image translation can be approached by a conditional generative model [7, 8] or a simple regression model [9] with the guide of target regions from paired examples. However, in the unsupervised case, the networks are unable to judge which specific scene objects should be changed in the absence of paired or aligned examples. For instance, in Fig. 1, when the image in summer is translated into winter(s →