Non-parametric Jensen-Shannon Divergence

Quantifying the difference between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form,

PDF / 404,994 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
107 Downloads / 213 Views

DOWNLOAD

REPORT

Abstract. Quantifying the diﬀerence between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form, and hence, before we can measure their divergence we ﬁrst need to assume a distribution or perform estimation. For exploratory purposes this is unsatisfactory, as we want to explore the data, not our expectations. In this paper we study how to non-parametrically measure the divergence between two distributions. More in particular, we formalise the well-known JensenShannon divergence using cumulative distribution functions. This allows us to calculate divergences directly and eﬃciently from data without the need for estimation. Moreover, empirical evaluation shows that our method performs very well in detecting diﬀerences between distributions, outperforming the state of the art in both statistical power and eﬃciency for a wide range of tasks.

1

Introduction

Measuring the diﬀerence between two distributions – their divergence – is a key element of many data analysis tasks. Let us consider a few examples. In time series analysis, for instance, to detect either changes or anomalies we need to quantify how diﬀerent the data in two windows is distributed [18,23]. In discretisation, if we want to maintain interactions, we should only merge bins when their multivariate distributions are similar [13]. In subgroup discovery, the quality of a subgroup depends on how much the distribution of its targets deviates from that of its complement data set [3,6]. To optimally quantify the divergence of two distributions we need the actual distributions. Particularly for exploratory tasks, however, we typically only have access to empirical data. That is, we do not know the actual distribution, nor even its form. This is especially true for real-valued data. Although we can always make assumptions (parametric) or estimating them by kernel density estimation (KDE), these are not quite ideal in practice. For example, both parametric and KDE methods are prone to the curse of dimensionality [22]. More importantly, they restrict our analysis to the speciﬁc types of distributions or kernels used. That is, if we are not careful we are exploring our expectations about the data, not the data itself. To stay as close to the data as possible, we hence study a non-parametric divergence measure. c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 173–189, 2015. DOI: 10.1007/978-3-319-23525-7 11

174

H.-V. Nguyen and J. Vreeken

In particular, we propose cjs, an information-theoretic divergence measure for numerical data. We build it upon the well-known Jensen-Shannon (js) divergence. Yet, while the latter works with probability distribution functions (pdfs), which need to be estimated, we consider cumulative distribution functions (cdfs) which can be obtained directly from data. cjs has many appealing properties. In a

Data Loading...

Non-parametric Jensen-Shannon Divergence

Recommend Documents

Divergence Control

Nonparametric Classification

Copy Divergence

A Nonparametric Coefficient

Applied Nonparametric Statistics in Reliability

Decision Support Using Nonparametric Statistics

Nonparametric Test on Process Capability

Bayesian Nonparametric Data Analysis

All of Nonparametric Statistics

Introduction to Nonparametric Estimation

Prediction in KL Divergence

Divergence from Randomness Models