Online Bayesian shrinkage regression
- PDF / 450,282 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 87 Downloads / 243 Views
(0123456789().,-volV) (0123456789().,-volV)
ORIGINAL ARTICLE
Online Bayesian shrinkage regression Waqas Jamil1 • Abdelhamid Bouchachia1 Received: 7 September 2019 / Accepted: 15 April 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract The present work introduces an original and new online regression method that extends the shrinkage via limit of Gibbs sampler (SLOG) in the context of online learning. In particular, we theoretically show how the proposed online SLOG (OSLOG) is obtained using the Bayesian framework without resorting to the Gibbs sampler or considering a hierarchical representation. Moreover, in order to define the performance guarantee of OSLOG, we derive an upper bound on the cumulative squared loss. It is the only online regression algorithm with sparsity that gives logarithmic regret. Furthermore, we do an empirical comparison with two state-of-the-art algorithms to illustrate the performance of OSLOG relying on three aspects: normality, sparsity and multicollinearity showing an excellent achievement of trade-off between these properties. Keywords Regression Regularisation Online learning Competitive analysis
1 Introduction Offline L1 -regularised regression Tibshirani [1], known as lasso, has been studied well in the past. In batch setting, the goal is to find the regression model weights, w, by solving: wlasso ¼ argmin jjY Xwjj22 þ kjjwjj1 w2Rn
ð1Þ
given training data X, label vector Y and a hyper-parameter k. A Bayesian solution for lasso weights estimation using Gibbs Sampler was proposed in Park and Casella [2] and later developed further in Rajaratnam et al. [3] resulting in the deterministic Bayesian lasso or better known as SLOG. By multiplying wlasso with test data, one can obtain predictions in batch setting. On the other hand, in online learning predictions are made sequentially. Online learning is useful when the application lends itself continuous learning (concept drift) A shorter version of the paper was presented at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 2019. & Waqas Jamil [email protected] Abdelhamid Bouchachia [email protected] 1
Machine Intelligence Group, Department of Computing and Informatics, Bournemouth University, Poole, UK
[4] or there are too much data that can’t fit into memory at once. Most of the work related to online L1 -regularised regression relies on gradient descent methods (e.g. subgradient, coordinate descent and other proximal algorithms) to compute the estimates of the model weights, see, for example, [5–8]. In contrast, the proposed algorithm learns by updating covariance matrix. At each trial T ¼ 1; 2; . . ., our learning algorithm receives input xT 2 Rn , makes prediction cT 2 R and then receives the actual output yT 2 R. Arguably the proposed method might not retain the sparsity properties when implemented with only one pass over the data. Nevertheless, it will have some degree of sparsity; we leave this matter for latter part of the
Data Loading...