Graph-Based Consistent Matching for Structure-from-Motion

Pairwise image matching of unordered image collections greatly affects the efficiency and accuracy of Structure-from-Motion (SfM). Insufficient match pairs may result in disconnected structures or incomplete components, while costly redundant pairs contai

  • PDF / 5,308,193 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 36 Downloads / 278 Views

DOWNLOAD

REPORT


Abstract. Pairwise image matching of unordered image collections greatly affects the efficiency and accuracy of Structure-from-Motion (SfM). Insufficient match pairs may result in disconnected structures or incomplete components, while costly redundant pairs containing erroneous ones may lead to folded and superimposed structures. This paper presents a graph-based image matching method that tackles the issues of completeness, efficiency and consistency in a unified framework. Our approach starts by chaining all but singleton images using a visualsimilarity-based minimum spanning tree. Then the minimum spanning tree is incrementally expanded to form locally consistent strong triplets. Finally, a global community-based graph algorithm is introduced to strengthen the global consistency by reinforcing potentially large connected components. We demonstrate the superior performance of our method in terms of accuracy and efficiency on both benchmark and Internet datasets. Our method also performs remarkably well on the challenging datasets of highly ambiguous and duplicated scenes. Keywords: Structure-from-Motion consistency

1

·

Image

matching

·

Loop

Introduction

Image matching is a computationally expensive step in 3D reconstruction, especially for large-scale unordered image datasets. Due to the large number of high-dimensional feature descriptors in an image, the naive quadratic matching scheme imposes a heavy computational burden on large-scale high-resolution 3D reconstruction [35]. Tremendous progress has been achieved either on reducing the cost of feature matching [19] or image indexing techniques [15,22] to pre-compute a subset of match candidates. Modern large-scale Structure-fromMotion (SfM) systems [1,12] usually use vocabulary tree [22] to choose the visually similar match pairs, which decreases the complexity of pairwise image matching from O(n2 ) to O(kn) with respect to the number of images. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46487-9 9) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part III, LNCS 9907, pp. 139–155, 2016. DOI: 10.1007/978-3-319-46487-9 9

140

T. Shen et al.

Fig. 1. Pipeline of the matching framework. (a) The minimum spanning tree (MST); (b) The triplet expansion process; (c) The final match graph after component merging, with different colors representing different communities; (d) Comparison of the proposed method with image retrieval techniques (see Sect. 5 for details); (e) The reference mesh model. (Color figure online)

However, two problems remain to be solved. One major drawback of the image indexing techniques is that the number of retrieved items k for a query image is hard to determine. Many previous works have adopted an empirical similarity threshold or a fixed retrieved number, which ignores the global connectivity of the image collection. Sometimes post-processing steps, such as query expansion [1,4], are also needed