Spatio-Temporally Consistent Correspondence for Dense Dynamic Scene Modeling

We address the problem of robust two-view correspondence estimation within the context of dynamic scene modeling. To this end, we investigate the use of local spatio-temporal assumptions to both identify and refine dense low-level data associations in the

  • PDF / 4,265,962 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 6 Downloads / 169 Views

DOWNLOAD

REPORT


Abstract. We address the problem of robust two-view correspondence estimation within the context of dynamic scene modeling. To this end, we investigate the use of local spatio-temporal assumptions to both identify and refine dense low-level data associations in the absence of prior dynamic content models. By developing a strictly data-driven approach to correspondence search, based on bottom-up local 3D motion cues of local rigidity and non-local coherence, we are able to robustly address the higher-order problems of video synchronization and dynamic surface modeling. Our findings suggest an important relationship between these two tasks, in that maximizing spatial coherence of surface points serves as a direct metric for the temporal alignment of local image sequences. The obtained results for these two problems on multiple publicly available dynamic reconstruction datasets illustrate both the effectiveness and generality of our proposed approach.

Keywords: Two-View correspondences

1

· Motion consistency

Introduction

Dynamic 3D scene modeling addresses the estimation of time-varying geometry from input imagery. Existing motion capture techniques have typically addressed well-controlled capture scenarios, where aspects such as camera positioning, sensor synchronization, and favorable scene content (i.e., fiducial markers or “green screen” backgrounds) are either carefully designed a priori or controlled online. Given the abundance of available crowd-sourced video content, there is growing interest in estimating dynamic 3D representations from uncontrolled video capture. Whereas multi-camera static scene reconstruction methods leverage photoconsistency across spatially varying observations, their dynamic counterparts must address photoconsistency in the spatio-temporal domain. In this respect, the main challenges are (1) finding a common temporal reference frame across independent video captures, and (2) meaningfully propagating temporally Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46466-4 1) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 3–18, 2016. DOI: 10.1007/978-3-319-46466-4 1

4

D. Ji et al.

varying photo-consistency estimates across videos. These two correspondence problems – temporal correspondence search among unaligned video sequences and spatial correspondence for geometry estimation – must be solved jointly when performing dynamic 3D reconstruction on uncontrolled inputs. In this work, we address both of these challenges by enforcing the geometric consistency of optical flow measurements across spatially registered video segments. Moreover, our approach builds on the thesis that maximally consistent geometry is obtained with minimal temporal alignment error, and vice versa. Towards this end, we posit that it is possible to recover the spatiotemporal overlap of two image sequences by maximizing the set of consistent spatio-t