Fast and strong convergence of online learning algorithms
- PDF / 490,361 Bytes
- 26 Pages / 439.642 x 666.49 pts Page_size
- 1 Downloads / 204 Views
Fast and strong convergence of online learning algorithms Zheng-Chu Guo1 · Lei Shi2 Received: 3 August 2018 / Accepted: 29 May 2019 / © Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract In this paper, we study the online learning algorithm without explicit regularization terms. This algorithm is essentially a stochastic gradient descent scheme in a reproducing kernel Hilbert space (RKHS). The polynomially decaying step size in each iteration can play a role of regularization to ensure the generalization ability of online learning algorithm. We develop a novel capacity dependent analysis on the performance of the last iterate of online learning algorithm. This answers an open problem in learning theory. The contribution of this paper is twofold. First, our novel capacity dependent analysis can lead to sharp convergence rate in the standard mean square distance which improves the results in the literature. Second, we establish, for the first time, the strong convergence of the last iterate with polynomially decaying step sizes in the RKHS norm. We demonstrate that the theoretical analysis established in this paper fully exploits the fine structure of the underlying RKHS, and thus can lead to sharp error estimates of online learning algorithm. Keywords Learning theory · Online learning · Capacity dependent error analysis · Strong convergence in an RKHS Mathematics Subject Classification (2010) 68Q32 · 68T05 · 62J02 · 62L20
Communicated by: Gitta Kutyniok Lei Shi
[email protected] Zheng-Chu Guo [email protected] 1
School of Mathematical Sciences, Zhejiang University, Hangzhou 310027, People’s Republic of China
2
Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, Shanghai 200433, People’s Republic of China
Z.-C. Guo, L. Shi
1 Introduction Analyzing and processing large-scale data sets is becoming ubiquitous in the era of big data. How to reduce computational complexity and memory requirement is the pivotal consideration for designing learning algorithms for big data. Although the batch learning algorithms are well understood in theory and widely used in various applications, as the sample size T gets larger, it is still challenging to find effective solutions in practice. Generally, if the optimization process in batch learning involves matrix inversion or decomposition, it usually requires O(T 2 ) memory and O(T 3 ) time. Such scalings of algorithmic complexity seriously limit the performance of batch learning for big data. Contrast to the batch learning which is required to tackle the whole data set in a batch, online learning processes the data one by one and updates the output in time. Inspired by the gradient descent method, online learning iteratively builds an unbiased estimate of the true gradient upon the arrival of a new data and uses this information to guide the learning process. One distinguished feature of online learning algorithm as compared with its batch counterpart is the prominent computational sp
Data Loading...