EI-LSH: An early-termination driven I/O efficient incremental c -approximate nearest neighbor search
- PDF / 1,414,720 Bytes
- 21 Pages / 595.276 x 790.866 pts Page_size
- 10 Downloads / 200 Views
REGULAR PAPER
EI-LSH: An early-termination driven I/O efficient incremental c-approximate nearest neighbor search Wanqi Liu1,2 · Hanchen Wang1,2
· Ying Zhang2 · Wei Wang3 · Lu Qin2 · Xuemin Lin3
Received: 21 November 2019 / Revised: 6 August 2020 / Accepted: 1 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Nearest neighbor in high-dimensional space has been widely used in various fields such as databases, data mining and machine learning. The problem has been well solved in low-dimensional space. However, when it comes to high-dimensional space, due to the curse of dimensionality, the problem is challenging. As a trade-off between accuracy and efficiency, c-approximate nearest neighbor (c-ANN) is considered instead of an exact NN search in high-dimensional space. A variety of c-ANN algorithms have been proposed, one of the important schemes for the c-ANN problem is called Locality-sensitive hashing (LSH), which projects a high-dimensional dataset into a low-dimensional dataset and can return a c-ANN with a constant probability. In this paper, we propose a new aggressive early-termination (ET) condition which stops the algorithm with LSH scheme earlier under the same theoretical guarantee, leading to a smaller I/O cost and less running time. Unlike the “conservative” early termination conditions used in previous studies, we propose an “aggressive” early termination condition which can stop much earlier. Though it is not absolutely safe and may result in the probability of failure, we can still devise more efficient algorithms under the same theoretical guarantee by carefully considering the failure probabilities brought by LSH scheme and early termination. Furthermore, we also introduce an incremental searching strategy. Unlike the previous LSH methods, which expand the bucket width in an exponential way, we employ a more natural search strategy to incrementally access the hash values of the objects. We also provide a rigorous theoretical analysis to underpin our incremental search strategy and the new early termination technique. Our comprehensive experiment results show that, compared with the stateof-the-art I/O efficient c-ANN techniques, our proposed algorithm, namely EI-LSH, can achieve much better I/O efficiency under the same theoretical guarantee. Keywords Locality-sensitive hashing · Approximate nearest neighbor search · Similarity search · I/O efficient algorithm
1 Introduction
B
Hanchen Wang [email protected] Wanqi Liu [email protected] Ying Zhang [email protected] Wei Wang [email protected] Lu Qin [email protected] Xuemin Lin [email protected]
1
Zhejiang Gongshang University, Hangzhou, China
2
AAII, University of Technology, Sydney, Australia
3
The University of New South Wales, Sydney, Australia
Given a set of d-dimensional objects (points) and a query object (point), nearest neighbor (NN) search finds the object which has the smallest distance to the query object. This is a fundamental and essential operation in the applicati
Data Loading...