ShapeFit and ShapeKick for Robust, Scalable Structure from Motion

We introduce a new method for location recovery from pairwise directions that leverages an efficient convex program that comes with exact recovery guarantees, even in the presence of adversarial outliers. When pairwise directions represent scaled relative

  • PDF / 481,512 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 32 Downloads / 194 Views

DOWNLOAD

REPORT


4

Department of Computer Science, University of Maryland, College Park, MD, USA [email protected] 2 Department of Computational and Applied Mathematics, Rice University, Houston, TX, USA [email protected] 3 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA [email protected] Department of Computer Science, University of California, Los Angeles, CA, USA [email protected]

Abstract. We introduce a new method for location recovery from pairwise directions that leverages an efficient convex program that comes with exact recovery guarantees, even in the presence of adversarial outliers. When pairwise directions represent scaled relative positions between pairs of views (estimated for instance with epipolar geometry) our method can be used for location recovery, that is the determination of relative pose up to a single unknown scale. For this task, our method yields performance comparable to the state-of-the-art with an order of magnitude speed-up. Our proposed numerical framework is flexible in that it accommodates other approaches to location recovery and can be used to speed up other methods. These properties are demonstrated by extensively testing against state-of-the-art methods for location recovery on 13 large, irregular collections of images of real scenes in addition to simulated data with ground truth. Keywords: Structure from motion · Convex optimization · Corruptionrobust recovery

1

Introduction

The typical structure-from-motion (SfM) pipeline consists of (i) establishing sparse correspondence between local regions in different images of a (mostly) T. Goldstein, P. Hand, C. Lee and V. Voroninski—These authors contributed equally. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46478-7 18) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 289–304, 2016. DOI: 10.1007/978-3-319-46478-7 18

290

T. Goldstein et al.

rigid scene, (ii) exploiting constraints induced by epipolar geometry to obtain initial estimates of the relative pose (position and orientation) between pairs or triplets of views from which the images were captured, where each relative position is determined up to an arbitrary scale, (iii) reconciling all estimates and their scales to arrive at a consistent estimate up to a single global scale, finally (iv) performing bundle adjustment to refine the estimates of pose as well as the position of the sparse points in three-dimensional (3D) space that gave rise to the local regions in (i), also known as feature points. As in any cascade method,1 the overall solution is sensitive to failures in the early stages. While significant effort has gone into designing better descriptors for use in stage (i) of the pipeline, sparse correspondence is intrinsically local and therefore subject to ambiguity. This forces subsequent stages (ii), (iii) to deal with inevitable correspondence failures, often by solving combinatorial