LIFT: Learned Invariant Feature Transform
We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individ
- PDF / 4,969,855 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 16 Downloads / 242 Views
Computer Vision Laboratory, Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne, Switzerland {kwang.yi,eduard.trulls,pascal.fua}@epfl.ch 2 Institute for Computer Graphics and Vision, Graz University of Technology, Graz, Austria [email protected]
Abstract. We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-toend differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining. Keywords: Local features
1
· Feature descriptors · Deep Learning
Introduction
Local features play a key role in many Computer Vision applications. Finding and matching them across images has been the subject of vast amounts of research. Until recently, the best techniques relied on carefully hand-crafted features [1–5]. Over the past few years, as in many areas of Computer Vision, methods based in Machine Learning, and more specifically Deep Learning, have started to outperform these traditional methods [6–10]. These new algorithms, however, address only a single step in the complete processing chain, which includes detecting the features, computing their orientation, and extracting robust representations that allow us to match them across images. In this paper we introduce a novel Deep architecture that performs all three steps together. We demonstrate that it achieves better overall performance than the state-of-the-art methods, in large part because it allows these individual steps to be optimized to perform well in conjunction with each other. K.M. Yi, E. Trulls—Equally contributed. This work was supported in part by the EU FP7 project MAGELLAN under grant number ICT-FP7-611526. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46466-4 28) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 467–483, 2016. DOI: 10.1007/978-3-319-46466-4 28
468
K.M. Yi et al.
Fig. 1. Our integrated feature extraction pipeline. Our pipeline consists of three major components: the Detector, the Orientation Estimator, and the Descriptor. They are tied together with differentiable operations to preserve end-to-end differentiability. (Figures are best viewed in color.) (Color figure online)
Our architecture, which we refer to as LIFT for Learned Invariant Feature Transform, is depicted by Fig. 1. It consists of three components that feed into each other: the Detector, the Orientation Estimator, and the Descriptor. Each one is based on Convolutional Neural Networks (CNNs), and patterned after recent ones [6,9,10] that have been shown to perform these individual functions well. To mesh them together we
Data Loading...