Interpreting the Ratio Criterion for Matching SIFT Descriptors

Matching keypoints by minimizing the Euclidean distance between their SIFT descriptors is an effective and extremely popular technique. Using the ratio between distances, as suggested by Lowe, is even more effective and leads to excellent matching accurac

  • PDF / 533,743 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 4 Downloads / 195 Views

DOWNLOAD

REPORT


Abstract. Matching keypoints by minimizing the Euclidean distance between their SIFT descriptors is an effective and extremely popular technique. Using the ratio between distances, as suggested by Lowe, is even more effective and leads to excellent matching accuracy. Probabilistic approaches that model the distribution of the distances were found effective as well. This work focuses, for the first time, on analyzing Lowe’s ratio criterion using a probabilistic approach. We provide two alternative interpretations of this criterion, which show that it is not only an effective heuristic but can also be formally justified. The first interpretation shows that Lowe’s ratio corresponds to a conditional probability that the match is incorrect. The second shows that the ratio corresponds to the Markov bound on this probability. The interpretations make it possible to slightly increase the effectiveness of the ratio criterion, and to obtain matching performance that exceeds all previous (non-learning based) results.

Keywords: SIFT

1

· Matching · a contrario

Introduction

Matching objects in different images is a fundamental task in computer vision, with applications in object recognition, panorama stitching, and many more. The common practice is to extract a set of distinctive keypoints from each image, compute a descriptor for each keypoint, and then match the keypoints using a similarity measure between the descriptors and possibly also geometric constraints. Many methods for detecting the keypoints and computing their descriptors have been proposed. See the reviews [11,23,25]. The scale invariant feature transform (SIFT) suggested by Lowe [8,9] is arguably the dominant algorithm for both keypoint detection and keypoint description. It specifies feature points and corresponding neighborhoods as maxima in the scale space of the DoG operator. The descriptor itself is a set of histograms of gradient directions calculated in several (16) regions in this neighborhood, concatenated into a 128-dimensional vector. Various normalizations and several filtering stages help to optimize the descriptor. The combination of scale space, gradient direction, and histograms makes the SIFT descriptor robust to scale, rotation, and illumination changes, and yet discriminative. Keypoints are c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part V, LNCS 9909, pp. 697–712, 2016. DOI: 10.1007/978-3-319-46454-1 42

698

A. Kaplan et al.

matched by minimizing the Euclidean distance between their SIFT descriptors. However, to rank the matches, it is much more effective to use the distance ratios: ratio(ai , bj(i) ) =

ai − bj(i) 2 ai − bj  (i) 2

(1)

and not the distances themselves [8,9]. Here, ai denotes a descriptor in one image, and bj(i) , bj  (i) correspond to the closest and the second-closest descriptors in the other image. SIFT has been challenged by many competing descriptors. The variations try to achieve faster runtime (e.g. SURF [2]), robustness to affine transformation (ASIFT [14]), compatibility with colo