Cross-Dimensional Weighting for Aggregated Deep Convolutional Features
We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs. We first present a generalized framework that encompasses a broad fa
- PDF / 4,271,966 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 12 Downloads / 202 Views
Abstract. We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs. We first present a generalized framework that encompasses a broad family of approaches and includes cross-dimensional pooling and weighting steps. We then propose specific non-parametric schemes for both spatial- and channel-wise weighting that boost the effect of highly active spatial responses and at the same time regulate burstiness effects. We experiment on different public datasets for image search and show that our approach outperforms the current state-of-the-art for approaches based on pre-trained networks. We also provide an easy-to-use, open source implementation that reproduces our results.
1
Introduction
Visual image search has been evolving rapidly in recent years with hand-crafted local features giving way to learning-based ones. Deep Convolutional Neural Networks (CNNs) were popularized by the seminal work of Krizhevsky et al . [19] and have been shown to “effortlessly” improve the state-of-the-art in multiple computer vision domains [29], beating many highly optimized, domain-specific approaches. It comes as no surprise that such features, based on deep networks, have recently also dominated the field of visual image search [3–5,29]. Many recent image search approaches are based on deep features, e.g., Babenko et al . [4,5] and Razavian et al . [3,29] proposed different pooling strategies for such features and demonstrated state-of-the-art performance in popular benchmarks for compact image representations, i.e., representations of up to a few hundred dimensions. Motivated by these advances, in this paper we present a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation. We place our approach in a general family of approaches for multidimensional aggregation and weighting and present a specific instantiation that we have thus far found to be most effective on benchmark tasks. We base our cross-dimensional weighted features on a generic deep convolutional neural network. Since we aggregate outputs of convolutional layers before the fully connected ones, the data layer can be of arbitrary size [20]. We therefore avoid resizing and cropping the input image, allowing images of different c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part I, LNCS 9913, pp. 685–701, 2016. DOI: 10.1007/978-3-319-46604-0 48
686
Y. Kalantidis et al.
aspect ratios to keep their spatial characteristics intact. After extracting deep convolutional features from the last spatial layer of a CNN, we apply weighting both spatially and per channel before sum-pooling to create a final aggregation. We denote features derived after such cross-dimensional weighting and pooling as CroW features. Our contributions can be summarized as follows: – We present a generalized framework that sketches a family of approaches for aggreg
Data Loading...