View Synthesis by Appearance Flow

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels fro

  • PDF / 4,035,937 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 98 Downloads / 216 Views

DOWNLOAD

REPORT


Abstract. We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

1

Introduction

Consider the car in Fig. 1(a). Actually, what you are looking at is a flat twodimensional image that is but a projection of the three-dimensional physical car. Yet, numerous psychophysics experiments tell us that what you are seeing is not the 2D image but the 3D object that it represents. For example, one classic experiment demonstrates that people excel at “mental rotation” [2] – predicting what a given object would look like after a known 3D rotation is applied. In this paper, we study the computational equivalent of mental rotation called novel view synthesis. Given one or more input images of an object or a scene plus the desired viewpoint transformation, the goal is to synthesize a new image capturing this novel view, as shown in Fig. 1. Besides purely academic interest (how well can this be done?), novel view synthesis has a plethora of practical applications, mostly in computer graphics and virtual reality. For example, it could enable photo editing programs like Photoshop to manipulate objects in 3D instead of 2D. Or it could help create full virtual reality environments based on historic images or video footage. The ways that novel view synthesis has been approached in the past fall into two broad categories: geometry-based approaches and learning-based approaches. Geometric approaches try to first estimate (or fake) the approximate underlying 3D structure of the object, and then apply some transformation to c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 286–301, 2016. DOI: 10.1007/978-3-319-46493-0 18

View Synthesis by Appearance Flow

287

Fig. 1. Given an input image, our goal is to synthesize novel views of the same object (left) or scene (right) corresponding to various camera transformations (Ti ). Our approach, based on learning appearance flows, is able to generate higher-quality results than the previous method that directly outputs pixels in the target view [1].

the pixels in the input image to produce the output [3–9]. Besides the requirement of somehow estim