Polysemous Codes

This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Ham

PDF / 2,411,869 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
99 Downloads / 245 Views

DOWNLOAD

REPORT

Abstract. This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which oﬀer both the distance estimation quality of product quantization and the eﬃcient comparison of binary codes with Hamming distance. Their design is inspired by algorithms introduced in the 90’s to construct channel-optimized vector quantizers. At search time, this dual interpretation accelerates the search. Most of the indexed vectors are ﬁltered out with Hamming distance, letting only a fraction of the vectors to be ranked with an asymmetric distance estimator. The method is complementary with a coarse partitioning of the feature space such as the inverted multi-index. This is shown by our experiments performed on several public benchmarks such as the BIGANN dataset comprising one billion vectors, for which we report state-of-the-art results for query times below 0.3 millisecond per core. Last but not least, our approach allows the approximate computation of the k-NN graph associated with the Yahoo Flickr Creative Commons 100M, described by CNN image descriptors, in less than 8 h on a single machine.

1

Introduction

Nearest neighbor search, or more generally similarity search, has received a sustained attention from diﬀerent research communities in the last decades. The computer vision community has been especially active on this subject, which is of utmost importance when dealing with very large visual collections. While early approximate nearest neighbor (ANN) methods were mainly optimising the trade-oﬀ between speed and accuracy, many recent works [4,27,37,45] put memory requirements as a central criterion for several reasons. For instance, due to the memory hierarchy, using less memory means using faster memory: disks are slower than the main memory, the main memory is slower than CPU caches, etc. Accessing the memory may be the bottleneck of the search. Therefore algorithms using compact codes are likely to oﬀer a better eﬃciency than those relying on full vectors. For these reasons, we focus on ANN search with compact codes, which are able to make search in vector sets comprising as much as one billion vectors on a single machine. We distinguish two separate lines of research in ANN with compact codes. The ﬁrst class of methods proposes to map the original vectors to the Hamming hypercube [33,37,49,51]. The resulting bit-vectors are eﬃciently compared c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part II, LNCS 9906, pp. 785–801, 2016. DOI: 10.1007/978-3-319-46475-6 48

786

M. Douze et al.

with the Hamming distance thanks to optimized low-level processor instructions such as xor and popcnt, available both on CPUs and GPUs. Another increasingly popular approach [5,18,27,39,55,56] is to adopt a quantization point of view to achieve a better distance estimation for a given code size. While these two classes of approaches are often seen as contenders, they both have their advantages and drawbacks. Binary codes oﬀer a faster elementar

Data Loading...

Polysemous Codes

Recommend Documents

Codes and Turbo Codes

New EAQMDS codes constructed from negacyclic codes

Chaining Polysemous Senses for Evocation Recognition

Cyclic Codes

Arithmetic Codes

Perfect Codes

The subfield codes of several classes of linear codes

Codes and Their Contexts

Authentication / Secrecy Codes

New quantum codes from constacyclic and additive constacyclic codes

New quantum codes from matrix-product codes over small fields

Space-Time Block Codes