Learning Covariant Feature Detectors

Local covariant feature detection, namely the problem of extracting viewpoint invariant features from images, has so far largely resisted the application of machine learning techniques. In this paper, we propose the first fully general formulation for lea

  • PDF / 2,772,649 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 41 Downloads / 337 Views

DOWNLOAD

REPORT


Abstract. Local covariant feature detection, namely the problem of extracting viewpoint invariant features from images, has so far largely resisted the application of machine learning techniques. In this paper, we propose the first fully general formulation for learning local covariant feature detectors. We propose to cast detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. We then derive a covariance constraint that can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, proposing a novel analysis of local features in term of geometric transformations, and we show that all common and many uncommon detectors can be derived in this framework. Finally, we present empirical results on translation and rotation covariant detectors on standard feature benchmarks, showing the power and flexibility of the framework.

1

Introduction

Image matching, i.e. the problem of establishing point correspondences between two images of the same scene, is central to computer vision. In the past two decades, this problem stimulated the creation of numerous viewpoint invariant local feature detectors. These were also adopted in problems such as large scale image retrieval and object category recognition, as a general-purpose image representations. More recently, however, deep learning has replaced local features as the preferred method to construct image representations; in fact, the most recent works on local feature descriptors are now based on deep learning [10,46]. Differently from descriptors, the problem of constructing local feature detectors has so far largely resisted machine learning. The goal of a detector is to extract stable local features from images, which is an essential step in any matching algorithm based on sparse features. It may be surprising that machine learning has not been very successful at this task given that it has proved very useful in many other detection problems. We believe that the reason is the difficulty of devising a learning formulation for viewpoint invariant features. To clarify this difficulty, note that the fundamental aim of a local feature detector is to extract the same features from images regardless of effects such as viewpoint changes. In computer vision, this behavior is more formally called covariant detection. Handcrafted detectors achieve it by anchoring features to image structures, such as corners or blobs, that are preserved under a c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part III, LNCS 9915, pp. 100–117, 2016. DOI: 10.1007/978-3-319-49409-8 11

Learning Covariant Feature Detectors

101

Fig. 1. Detection by regression. We train a neural network φ that, given a patch x|p around each pixel p in an image, produces a displacement vector hp = φ(x|p ) pointing to the nearest feature location (middle column). Displacements from nearby pixels are then pooled to detect features (right colu