Deep Image Retrieval: Learning Global Representations for Image Search

We propose a novel approach for instance-level image retrieval. It produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a

PDF / 2,595,929 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
48 Downloads / 402 Views

DOWNLOAD

REPORT

Abstract. We propose a novel approach for instance-level image retrieval. It produces a global and compact ﬁxed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a black box to produce features, our method leverages a deep architecture trained for the speciﬁc task of image retrieval. Our contribution is twofold: (i) we leverage a ranking framework to learn convolution and projection weights that are used to build the region features; and (ii) we employ a region proposal network to learn which regions should be pooled to form the ﬁnal global descriptor. We show that using clean training data is key to the success of our approach. To that aim, we use a large scale but noisy landmark dataset and develop an automatic cleaning approach. The proposed architecture produces a global image representation in a single forward pass. Our approach signiﬁcantly outperforms previous approaches based on global descriptors on standard datasets. It even surpasses most prior works based on costly local descriptor indexing and spatial veriﬁcation. Additional material is available at www.xrce.xerox. com/Deep-Image-Retrieval. Keywords: Deep learning

1

· Instance-level retrieval

Introduction

Since their ground-breaking results on image classiﬁcation in recent ImageNet challenges [29,50], deep learning based methods have shined in many other computer vision tasks, including object detection [14] and semantic segmentation [31]. Recently, they also rekindled highly semantic tasks such as image captioning [12,28] and visual question answering [1]. However, for some problems such as instance-level image retrieval, deep learning methods have led to rather underwhelming results. In fact, for most image retrieval benchmarks, the state of the art is currently held by conventional methods relying on local descriptor matching and re-ranking with elaborate spatial veriﬁcation [30,34,58,59]. Recent works leveraging deep architectures for image retrieval are mostly limited to using a pre-trained network as local feature extractor. Most eﬀorts have been devoted towards designing image representations suitable for image retrieval on top of those features. This is challenging because representations for c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 241–257, 2016. DOI: 10.1007/978-3-319-46466-4 15

242

A. Gordo et al.

retrieval need to be compact while retaining most of the ﬁne details of the images. Contributions have been made to allow deep architectures to accurately represent input images of diﬀerent sizes and aspect ratios [5,27,60] or to address the lack of geometric invariance of convolutional neural network (CNN) features [15,48]. In this paper, we focus on learning these representations. We argue that one of the main reasons for the deep methods lagging behind the state of the art is the lack of supervised learning for the speciﬁc task of instance-level image retrieval. At the core of

Data Loading...

Deep Image Retrieval: Learning Global Representations for Image Search

Recommend Documents

Unifying Deep Local and Global Features for Image Search

Medical Image Tagging by Deep Learning and Retrieval

SIR: Similar Image Retrieval for Product Search in E-Commerce

Image Retrieval

Learning image representation from image reconstruction for a content-based medical image retrieval

Image Retrieval

Image Retrieval

Cross-Resolution Deep Features Based Image Search

Deep hashing for multi-label image retrieval: a survey

Learning Canonical Representations for Scene Graph to Image Generation

A Learning State-Space Model for Image Retrieval

Recent Advances in Intelligent Image Search and Video Retrieval