Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

PDF / 4,433,867 Bytes
58 Pages / 439.37 x 666.142 pts Page_size
16 Downloads / 263 Views

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods Nicolas Loizou1

· Peter Richtárik2

Received: 18 January 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems. Keywords Stochastic methods · Heavy ball momentum · Linear systems · Randomized coordinate descent · Randomized Kaczmarz · Stochastic gradient descent · Stochastic Newton · Quadratic optimization · Convex optimization

Work done while the first author was a PhD student at School of Mathematics, The University of Edinburgh.

B

Nicolas Loizou [email protected]

Extended author information available on the last page of the article

123

N. Loizou, P. Richtárik

Mathematics Subject Classification 68Q25 · 68W20 · 68W40 · 65Y20 · 90C15 · 90C20 · 90C25 · 15A06 · 15B52 · 65F10

1 Introduction Two of the most popular algorithmic ideas for solving optimization problems involving big volumes of data are stochastic approximation and momentum. By stochastic approximation we refer to the practice pioneered by Robins and Monro [79] of replacement of costly-to-compute quantities (e.g., gradient of the objective function) by cheaply-to-compute stochastic approximations thereof (e.g., unbiased estimate of the gradient). By momentum we refer to the heavy ball technique originally developed by Polyak [68] to accelerate the convergence rate of gradient-type methods. While much is known about the effects of stochastic approximation and momentum in isolation, surprisingly li

Data Loading...

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

Recommend Documents

Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent

Convergence of Stochastic Proximal Gradient Algorithm

Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality

Bi-fidelity stochastic gradient descent for structural optimization under uncertainty

An Improved Privacy-Preserving Stochastic Gradient Descent Algorithm

Accelerating variance-reduced stochastic gradient methods

Ritz-like values in steplength selections for stochastic gradient methods

Gradient-Free Methods with Inexact Oracle for Convex-Concave Stochastic Saddle-Point Problem

Momentum Distributions

Diplomatic Momentum

Noncommutative Momentum and Torsional Regularization

Momentum, Impulse, and Collisions