Revisiting Additive Quantization

We revisit Additive Quantization (AQ), an approach to vector quantization that uses multiple, full-dimensional, and non-orthogonal codebooks. Despite its elegant and simple formulation, AQ has failed to achieve state-of-the-art performance on standard ret

  • PDF / 962,569 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 47 Downloads / 246 Views

DOWNLOAD

REPORT


Abstract. We revisit Additive Quantization (AQ), an approach to vector quantization that uses multiple, full-dimensional, and non-orthogonal codebooks. Despite its elegant and simple formulation, AQ has failed to achieve state-of-the-art performance on standard retrieval benchmarks, because the encoding problem, which amounts to MAP inference in multiple fully-connected Markov Random Fields (MRFs), has proven to be hard to solve. We demonstrate that the performance of AQ can be improved to surpass the state of the art by leveraging iterated local search, a stochastic local search approach known to work well for a range of NP-hard combinatorial problems. We further show a direct application of our approach to a recent formulation of vector quantization that enforces sparsity of the codebooks. Unlike previous work, which required specialized optimization techniques, our formulation can be plugged directly into state-of-the-art lasso optimizers. This results in a conceptually simple, easily implemented method that outperforms the previous state of the art in solving sparse vector quantization. Our implementation is publicly available (https://github.com/ jltmtz/local-search-quantization).

1

Introduction

Computer vision applications often involve computing the similarity of many high-dimensional, real-valued image representations, in a process known as feature matching. When large databases of images are used, this results in significant computational bottlenecks. For example, in structure from motion [1], it is common to estimate the relative viewpoint of each image in large collections of photographs by computing the pairwise similarity of several million SIFT [2] descriptors; it is also now common for retrieval and classification datasets to comprise millions of images [3,4]. In practice, the large-scale retrieval problem often translates into largescale approximate nearest neighbour (ANN) search and has traditionally been addressed with hashing [5,6]. However, a family of methods based on vector quantization has recently demonstrated superior performance and scalability, sparking interest from the machine learning, computer vision and multimedia retrieval communities [7–12]. These methods are all based on multi-codebook quantization (MCQ), a generalization of k-means clustering with cluster centres arising from the sum of entries in multiple codebooks. Other applications c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 137–153, 2016. DOI: 10.1007/978-3-319-46475-6 9

138

J. Martinez et al.

of MCQ include large-scale maximum inner product search (MIPS) [13,14], and the compression of deep neural networks for mobile devices [15]. Like k-means clustering, MCQ is posed as the search for a set of codes and codebooks that best approximate a given dataset. While early approaches to MCQ were designed enforcing codebook orthogonality [9,10], more recent methods make use of non-orthogonal, often full-dimensional codebooks [7,8,11]. A problem faced by these methods