Pixelwise View Selection for Unstructured Multi-View Stereo

This work presents a Multi-View Stereo system for robust and efficient dense modeling from unstructured image collections. Our core contributions are the joint estimation of depth and normal information, pixelwise view selection using photometric and geom

  • PDF / 5,247,844 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 25 Downloads / 396 Views

DOWNLOAD

REPORT


ETH Z¨ urich, Z¨ urich, Switzerland {jsch,pomarc}@inf.ethz.ch UNC Chapel Hill, Chapel Hill, USA {ezheng,jmf}@cs.unc.edu 3 Microsoft, Redmond, USA

Abstract. This work presents a Multi-View Stereo system for robust and efficient dense modeling from unstructured image collections. Our core contributions are the joint estimation of depth and normal information, pixelwise view selection using photometric and geometric priors, and a multi-view geometric consistency term for the simultaneous refinement and image-based depth and normal fusion. Experiments on benchmarks and large-scale Internet photo collections demonstrate stateof-the-art performance in terms of accuracy, completeness, and efficiency.

1

Introduction

Large-scale 3D reconstruction from Internet photos has seen a tremendous evolution in sparse modeling using Structure-from-Motion (SfM) [1–8] and in dense modeling using Multi-View Stereo (MVS) [9–15]. Many applications benefit from a dense scene representation, e.g.,, classification [16], image-based rendering [17], localization [18], etc. Despite the widespread use of MVS, the efficient and robust estimation of accurate, complete, and aesthetically pleasing dense models in uncontrolled environments remains a challenging task. Dense pixelwise correspondence search is the core problem of stereo methods. Recovering correct correspondence is challenging even in controlled environments with known viewing geometry and illumination. In uncontrolled settings, e.g.,, where the input consists of crowd-sourced images, it is crucial to account for various factors, such as heterogeneous resolution and illumination, scene variability, unstructured viewing geometry, and mis-registered views. Our proposed approach improves the state of the art in dense reconstruction for unstructured images. This work leverages the optimization framework by Zheng et al. [14] to propose the following core contributions: (1) Pixelwise Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46487-9 31) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 501–518, 2016. DOI: 10.1007/978-3-319-46487-9 31

502

J.L. Sch¨ onberger et al.

Fig. 1. Reconstructions for Louvre, Todai-ji, Paris Opera, and Astronomical Clock.

normal estimation embedded into an improved PatchMatch sampling scheme. (2) Pixelwise view selection using triangulation angle, incident angle, and image resolution-based geometric priors. (3) Integration of a “temporal” view selection smoothness term. (4) Adaptive window support through bilateral photometric consistency for improved occlusion boundary behavior. (5) Introduction of a multi-view geometric consistency term for simultaneous depth/normal estimation and image-based fusion. (6) Reliable depth/normal filtering and fusion. Outlier-free and accurate depth/normal estimates further allow for direct meshing of the resulting point cloud. We achieve state-of-the-art results