Perceptual Losses for Real-Time Style Transfer and Super-Resolution

We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth ima

  • PDF / 8,895,808 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 75 Downloads / 292 Views

DOWNLOAD

REPORT


Abstract. We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

Keywords: Style transfer

1

· Super-resolution · Deep learning

Introduction

Many classic problems can be framed as image transformation tasks, where a system receives some input image and transforms it into an output image. Examples from image processing include denoising, super-resolution, and colorization, where the input is a degraded image (noisy, low-resolution, or grayscale) and the output is a high-quality color image. Examples from computer vision include semantic segmentation and depth estimation, where the input is a color image and the output image encodes semantic or geometric information about the scene. One approach for solving image transformation tasks is to train a feedforward convolutional neural network in a supervised manner, using a per-pixel loss function to measure the difference between output and ground-truth images. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46475-6 43) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 694–711, 2016. DOI: 10.1007/978-3-319-46475-6 43

Perceptual Losses for Real-Time Style Transfer and Super-Resolution Content

Gatys et al. [11]

Ours

Bicubic

SRCNN [13]

Perceptual loss

SuperResolution

Style Transfer

Style

695

Ground Truth

Fig. 1. Example results for style transfer (top) and ×4 super-resolution (bottom). For style transfer, we achieve similar results as Gatys et al. [11] but are three orders of magnitude faster. For super-resolution our method trained with a perceptual loss is able to better reconstruct fine details compared to methods trained with per-pixel loss.

This approach has been used for example by Dong et al. for super-resolution [1], by Cheng et al. for colorization [2,3], by Long et al. for segmentation [4], and by Eigen et al. for depth and surface normal prediction [5,6]. Such approaches are efficient at test-time, requiring only a forward