Volume Sweeping: Learning Photoconsistency for Multi-View Shape Reconstruction

  • PDF / 3,274,901 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 101 Downloads / 166 Views

DOWNLOAD

REPORT


Volume Sweeping: Learning Photoconsistency for Multi-View Shape Reconstruction Vincent Leroy1,2,3

· Jean-Sébastien Franco1,2 · Edmond Boyer1,2

Received: 23 May 2019 / Accepted: 24 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We propose a full study and methodology for multi-view stereo reconstruction with performance capture data. Multi-view 3D reconstruction has largely been studied with general, high resolution and high texture content inputs, where classic low-level feature extraction and matching are generally successful. However in performance capture scenarios, texture content is limited by wider angle shots resulting in smaller subject projection areas, and intrinsically low image content of casual clothing. We present a dedicated pipeline, based on a per-camera depth map sweeping strategy, analyzing in particular how recent deep network advances allow to replace classic multi-view photoconsistency functions with one that is learned. We show that learning based on a volumetric receptive field around a 3D depth candidate improves over using per-view 2D windows, giving the photoconsistency inference more visibility over local 3D correlations in viewpoint color aggregation. Despite being trained on a standard dataset of scanned static objects, the proposed method is shown to generalize and significantly outperform existing approaches on performance capture data, while achieving competitive results on recent benchmarks. Keywords Multi view stereo reconstruction · Learned photoconsistency · Performance capture · Volume sweeping

1 Introduction In this paper, we examine the problem of multi-view shape reconstruction of production-realistic performance capture sequences. Such sequences may contain arbitrary casual clothing and motions, and have specific capture set assumptions due to the particular lighting and camera positioning of these setups. Multi-view 3D reconstruction is a popular and mature field, with numerous applications involving the recording and replay of captured 3D scenes, such as 3D content creation for broadcast and mobile applications, or the

B

Vincent Leroy [email protected] Jean-Sébastien Franco [email protected] Edmond Boyer [email protected]

1

Univ. Grenoble Alpes Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France

2

Institute of Engineering Univ. Grenoble Alpes, Grenoble, France

3

NAVER LABS Europe, 6 chemin de Maupertuis, 38240 Meylan, France

increasingly popular virtual and augmented reality applications with 3D user avatars. An essential and still improvable aspect in this matter, in particular with performance capture setups, is the fidelity and quality of the recovered shapes, our goal in this work (Fig. 1). Multi-view stereo (MVS) based methods have attained a good level of quality with pipelines that typically comprise feature extraction, matching stages and 3D shape extraction. Interestingly, very recent works have re-examined stereo and MVS by introducing features and similarity functions automati