LIFT: Learned Invariant Feature Transform

We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individ

PDF / 4,969,855 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
16 Downloads / 260 Views

DOWNLOAD

REPORT

Computer Vision Laboratory, Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Lausanne, Switzerland {kwang.yi,eduard.trulls,pascal.fua}@epfl.ch 2 Institute for Computer Graphics and Vision, Graz University of Technology, Graz, Austria [email protected]

Abstract. We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a uniﬁed manner while preserving end-toend diﬀerentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining. Keywords: Local features

1

· Feature descriptors · Deep Learning

Introduction

Local features play a key role in many Computer Vision applications. Finding and matching them across images has been the subject of vast amounts of research. Until recently, the best techniques relied on carefully hand-crafted features [1–5]. Over the past few years, as in many areas of Computer Vision, methods based in Machine Learning, and more speciﬁcally Deep Learning, have started to outperform these traditional methods [6–10]. These new algorithms, however, address only a single step in the complete processing chain, which includes detecting the features, computing their orientation, and extracting robust representations that allow us to match them across images. In this paper we introduce a novel Deep architecture that performs all three steps together. We demonstrate that it achieves better overall performance than the state-of-the-art methods, in large part because it allows these individual steps to be optimized to perform well in conjunction with each other. K.M. Yi, E. Trulls—Equally contributed. This work was supported in part by the EU FP7 project MAGELLAN under grant number ICT-FP7-611526. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46466-4 28) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 467–483, 2016. DOI: 10.1007/978-3-319-46466-4 28

468

K.M. Yi et al.

Fig. 1. Our integrated feature extraction pipeline. Our pipeline consists of three major components: the Detector, the Orientation Estimator, and the Descriptor. They are tied together with diﬀerentiable operations to preserve end-to-end diﬀerentiability. (Figures are best viewed in color.) (Color ﬁgure online)

Our architecture, which we refer to as LIFT for Learned Invariant Feature Transform, is depicted by Fig. 1. It consists of three components that feed into each other: the Detector, the Orientation Estimator, and the Descriptor. Each one is based on Convolutional Neural Networks (CNNs), and patterned after recent ones [6,9,10] that have been shown to perform these individual functions well. To mesh them together we

Data Loading...

LIFT: Learned Invariant Feature Transform

Recommend Documents

Face Recognition System Using a Hybrid Scale Invariant Feature Transform Based on Local Binary Pattern

Domain invariant feature extraction against evasion attack

Spherical Feature Transform for Deep Metric Learning

A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval

Noise Robust Illumination Invariant Face Recognition via Contourlet Transform in Logarithm Domain

Coronal Brow Lift

Endoscopic Forehead Lift

Invariant Theory

Evaluation of Two Feature Extraction Techniques for Age-Invariant Face Recognition

Learning Invariant Feature Representation to Improve Generalization Across Chest X-Ray Datasets

Lessons Learned

Segmentation of Infant Hippocampus Using Common Feature Representations Learned for Multimodal Longitudinal Data