Online Bayesian shrinkage regression

PDF / 450,282 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
87 Downloads / 290 Views

(0123456789().,-volV) (0123456789().,-volV)

ORIGINAL ARTICLE

Online Bayesian shrinkage regression Waqas Jamil1 • Abdelhamid Bouchachia1 Received: 7 September 2019 / Accepted: 15 April 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The present work introduces an original and new online regression method that extends the shrinkage via limit of Gibbs sampler (SLOG) in the context of online learning. In particular, we theoretically show how the proposed online SLOG (OSLOG) is obtained using the Bayesian framework without resorting to the Gibbs sampler or considering a hierarchical representation. Moreover, in order to define the performance guarantee of OSLOG, we derive an upper bound on the cumulative squared loss. It is the only online regression algorithm with sparsity that gives logarithmic regret. Furthermore, we do an empirical comparison with two state-of-the-art algorithms to illustrate the performance of OSLOG relying on three aspects: normality, sparsity and multicollinearity showing an excellent achievement of trade-off between these properties. Keywords Regression Regularisation Online learning Competitive analysis

1 Introduction Offline L1 -regularised regression Tibshirani [1], known as lasso, has been studied well in the past. In batch setting, the goal is to find the regression model weights, w, by solving: wlasso ¼ argmin jjY Xwjj22 þ kjjwjj1 w2Rn

ð1Þ

given training data X, label vector Y and a hyper-parameter k. A Bayesian solution for lasso weights estimation using Gibbs Sampler was proposed in Park and Casella [2] and later developed further in Rajaratnam et al. [3] resulting in the deterministic Bayesian lasso or better known as SLOG. By multiplying wlasso with test data, one can obtain predictions in batch setting. On the other hand, in online learning predictions are made sequentially. Online learning is useful when the application lends itself continuous learning (concept drift) A shorter version of the paper was presented at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2019. & Waqas Jamil [email protected] Abdelhamid Bouchachia [email protected] 1

Machine Intelligence Group, Department of Computing and Informatics, Bournemouth University, Poole, UK

[4] or there are too much data that can’t fit into memory at once. Most of the work related to online L1 -regularised regression relies on gradient descent methods (e.g. subgradient, coordinate descent and other proximal algorithms) to compute the estimates of the model weights, see, for example, [5–8]. In contrast, the proposed algorithm learns by updating covariance matrix. At each trial T ¼ 1; 2; . . ., our learning algorithm receives input xT 2 Rn , makes prediction cT 2 R and then receives the actual output yT 2 R. Arguably the proposed method might not retain the sparsity properties when implemented with only one pass over the data. Nevertheless, it will have some degree of sparsity; we leave this matter for latter part of the

Data Loading...

Online Bayesian shrinkage regression

Recommend Documents

Exploiting Adaptive Bayesian Regression Shrinkage to Identify Exome Sequence Variants Associated with Gene Expression

Bayesian Spatial Regression

Bayesian wavelet shrinkage with beta priors

Online Variational Bayesian Motion Averaging

Online Bayesian max-margin subspace learning for multi-view classification and regression

Bayesian Quantile Regression in Differential Equation Models

Bayesian Networks for Online Cybersecurity Threat Detection

Brq: an R package for Bayesian quantile regression

Bayesian Modeling for Simultaneous Regression and Record Linkage

Objective Bayesian analysis for exponential power regression models

A Bayesian Nonlinear Regression Model Based on t-Distributed Errors

Bayesian Spatial Regression for Multi-source Predictive Mapping