Near-Optimal Coresets of Kernel Density Estimates

PDF / 494,666 Bytes
21 Pages / 439.37 x 666.142 pts Page_size
83 Downloads / 250 Views

Near-Optimal Coresets of Kernel Density Estimates Jeﬀ M. Phillips1 · Wai Ming Tai1 Received: 22 June 2018 / Revised: 14 August 2019 / Accepted: 27 August 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract We construct near-optimal coresets for kernel density estimates for points in Rd when the kernel is positive definite. Specifically we provide a polynomial time construction √ √ for a coreset √ of size O( d/ε· log 1/ε), and we show a near-matching lower bound of size (min{ d/ε, 1/ε2 }). When d ≥ 1/ε2 , it is known that the size of coreset can be O(1/ε2 ). The upper bound is a polynomial-in-(1/ε) improvement when d ∈ [3, 1/ε2 ) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative. Keywords Coreset · Kernel density estimate · Discrepancy theory

1 Introduction Kernel density estimates are pervasive objects in data analysis. They are the classic way to estimate a continuous distribution from a finite sample of points [46,47]. With some points negatively weighted, they are the prediction function in kernel SVM classifiers [44]. They are the core of many robust topological reconstruction approaches [7,17,41]; and they arise in many other applications including mode estimation [1], outlier detection [45], regression [16], and clustering [42].

Editor in Charge: Kenneth Clarkson J.M. Phillips thanks the support by NSF CCF-1350888, IIS-1251019, ACI-1443046, CNS-1514520, and CNS-1564287. Jeff M. Phillips [email protected] Wai Ming Tai [email protected] 1

University of Utah, Salt Lake City, USA

123

Discrete & Computational Geometry

Consider a dataset P ⊂ Rd of size n, and a kernel K : Rd × Rd → R, for instance the Gaussian kernel K (x, p) = exp(−α 2 x − p2 ) with 1/α as a bandwidth parameter. Then a kernel density estimate is defined at any point x ∈ Rd as 1 kde P (x) = n p∈P K (x, p). Given that it takes O(n) time to evaluate kde P , and that data sets are growing to massive sizes, in order to continue to use these powerful modeling objects, a common approach is to replace P with a much smaller data set Q so that kde Q approximates kde P . While statisticians have classically studied various sorts of average deviations (L 2 [46,47] or L 1 error [13]), for most modern data modeling purposes, a worstcase L ∞ is more relevant (e.g., for preserving classification margins [44], density estimates [51], topology [41], and hypothesis testing on distributions [22]). Specifically this error guarantee preserves kde P − kde Q ∞ = sup |kde P (x) − kde Q (x)| ≤ ε. x∈Rd

We call such a set Q an ε-KDE coreset of P. In this paper we study how small can Q be as a function of error ε, dimension d, and properties of the kernels.1 1.1 Background on Kernels and Related Coresets Tradit

Data Loading...

Near-Optimal Coresets of Kernel Density Estimates

Recommend Documents

A Kernel Goodness-of-fit Test for Maximum Likelihood Density Estimates of Normal Mixtures

Kernel Circular Deconvolution Density Estimation

New type of gamma kernel density estimator

Optimal Kernel Selection for Density Estimation

Heat Kernel Estimates for Non-symmetric Finite Range Jump Processes

Density Estimates of 1-Avoiding Sets via Higher Order Correlations

Efficient Multi-frequency Phase Unwrapping Using Kernel Density Estimation

Large-Scale Simultaneous Testing Using Kernel Density Estimation

New multivariate kernel density estimator for uncertain data classification

On Coresets for Support Vector Machines

Sharp heat kernel estimates for spectral fractional Laplacian perturbed by gradients

Kernel