An Efficient Sum Query Algorithm for Distance-Based Locally Dominating Functions

  • PDF / 3,176,700 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 48 Downloads / 161 Views

DOWNLOAD

REPORT


An Efficient Sum Query Algorithm for Distance‑Based Locally Dominating Functions Ziyun Huang1   · Jinhui Xu2 Received: 17 January 2018 / Accepted: 21 February 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper, we consider the following sum query problem: Given a point set P in ℝd , and a distance-based function f(p,  q) (i.e., a function of the distance between p and q) satisfying some general properties, the goal is to develop a data structure and a query algorithm for efficiently computing a (1 + 𝜖)-approximate solution to ∑ the sum p∈P f (p, q) for any query point q ∈ ℝd and any small constant 𝜖 > 0 . Existing techniques for this problem are mainly based on some core-set techniques which often have difficulties to deal with functions with local domination property. Based on several new insights to this problem, we develop in this paper a novel technique to overcome these encountered difficulties. Our algorithm is capable of answer̃ 𝜖,d (n0.5+c ) , and the ing queries with high success probability in time no more than O 1+c ̃ underlying data structure can be constructed in O𝜖,d (n ) time for any c > 0 , where the hidden constant has only polynomial dependence on 1∕𝜖 and d. Our technique is simple and can be easily implemented for practical purpose. Keywords  Sum query · Distance-based function · Local domination · High dimensions · Data structure

1 Introduction In this paper, we consider the following sum query problem: Given a set P of points in ℝd (where the dimensionality d could be very high) and a function f(, ), the sum query problem is to build a data structure for P so that the sum of * Ziyun Huang [email protected] Jinhui Xu [email protected] 1

Department of Computer Science and Software Engineering, Penn State Erie, The Behrend College, Erie, USA

2

Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, USA



13

Vol.:(0123456789)

Algorithmica p∈P f (p, q) can be efficiently computed or approximated for any query point q in ℝd  , where f(p, q) is a non-negative distance-based function. We say that f(p, q) is distance-based if the value of f(p, q) depends only on the distance between p and q. In other words, f(p, q) can be written as F(‖p, q‖) for some non-negative real function F(⋅). The distance-based sum query problem are frequently encountered in applications. A good example is kernel density estimation [17], which is an important problem in machine learning. Given a point set P in ℝd , and a non-negative kernel function K(x, y) for any point pair x, y ∈ ℝd (such as the frequently used Gaussian kernel function K(x, y) = exp(−‖x − y‖2 ∕𝜎 2 ) and the Laplacian kernel function K(x, y) = exp(−‖x − y‖∕𝜎) ), the goal of kernel density estimation is to preprocess P ∑ so that given any q ∈ ℝd as a query, the value of p∈P K(p, q)∕�P� can be estimated efficiently. In applications, K(x, y) is often “shift invariant” or distance-based (i.e., K(x, y) is a function of ‖x − y‖ ). The sum query problem also appears in man