Image Matching Across Wide Baselines: From Paper to Practice

  • PDF / 11,087,262 Bytes
  • 31 Pages / 595.276 x 790.866 pts Page_size
  • 69 Downloads / 276 Views

DOWNLOAD

REPORT


Image Matching Across Wide Baselines: From Paper to Practice Yuhe Jin1 · Dmytro Mishkin2 Eduard Trulls4

· Anastasiia Mishchuk3 · Jiri Matas2

· Pascal Fua3 · Kwang Moo Yi1 ·

Received: 8 May 2020 / Accepted: 11 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task—the accuracy of the reconstructed camera pose—as our primary metric. Our pipeline’s modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the perceived state of the art. Besides establishing the actual state of the art, the conducted experiments reveal unexpected properties of structure from motion pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online (https://github.com/ubcvision/image-matching-benchmark), providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both alongside and against top-performing methods. This work provides a basis for the Image Matching Challenge (https://image-matching-challenge.github.io). Keywords Benchmark · Dataset · Stereo · Structure from motion · Local features · 3D reconstruction

1 Introduction Communicated by Konrad Schindler. This work was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant “Deep Visual Geometry Machines” (RGPIN-2018-03788), by systems supplied by Compute Canada, and by Google’s Visual Positioning Service. DM and JM were supported by OP VVV funded Project CZ.02.1.01/0.0/0.0/16 019/0000765 “Research Center for Informatics”. DM was also supported by CTU student Grant SGS17/185/OHK3/3T/13 and by the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry for Digital and Economic Affairs, and the Province of Upper Austria in the frame of the COMET center SCCH. AM was supported by the Swiss National Science Foundation.

B

Matching two or more views of a scene is at the core of fundamental computer vision problems, including image retrieval (Lowe 2004; Arandjelovic et al. 2016; Radenovic et al. 2016; Tolias et al. Feb 2016; Noh et al. 2017), 3D reconstruction (Agarwal et al. 2009; Heinly et al. 2015; Schönberger and Frahm 2016; Zhu et al. 2018), re-localization (Sattler et al. 2012, 2018; Lynen et al. 2019), and SLAM (MurArtal et al. 2015; Detone et al. 2017, 2018). Despite decades of research, image matching remains unsolved in the general, wide-baseline scenario. Image matching is a challenging problem with many factors that need to be taken into account, e.g., viewpoint, illumination, occlusions, and camera prop-

Eduard Trulls [email protected] Yuhe Jin y