Convergence of Stochastic Proximal Gradient Algorithm

PDF / 874,110 Bytes
27 Pages / 439.37 x 666.142 pts Page_size
91 Downloads / 252 Views

Convergence of Stochastic Proximal Gradient Algorithm Lorenzo Rosasco1,2 · Silvia Villa3 · Ba`˘ ng Công Vu˜ 4

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract We study the extension of the proximal gradient algorithm where only a stochastic gradient estimate is available and a relaxation step is allowed. We establish convergence rates for function values in the convex case, as well as almost sure convergence and convergence rates for the iterates under further convexity assumptions. Our analysis avoid averaging the iterates and error summability assumptions which might not be satisfied in applications, e.g. in machine learning. Our proofing technique extends classical ideas from the analysis of deterministic proximal gradient algorithms. Keywords Proximal methods · Forward–backward splitting algorithm · Stochastic optimization

1 Introduction First order methods are widely applied to solve optimization problems in a variety of areas including machine learning [2,32] and signal processing [15]. In particular, proximal gradient algorithms, also known as forward–backward splitting algorithms [4], are natural in the common situation where the function to be minimized is the sum of a data fitting term and a regularizer. These approaches separate the contribution of each component: at each iteration the proximal operator defined by the

B

Silvia Villa [email protected] Lorenzo Rosasco [email protected] Ba`˘ ng Công V˜u [email protected]

1

DIBRIS, Università di Genova, Via Dodecaneso, 35, 16146 Genoa, Italy

2

LCSL, Istituto Italiano di Tecnologia and Massachusetts Institute of Technology, Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA

3

Dipartimento di Matematica, Università di Genova, Via Dodecaneso, 35, 16146 Genoa, Italy

4

EPFL STI IEL LIONS, ELD 243 (Batiment EL), Station 11, 1015 Lausanne, Switzerland

123

Applied Mathematics & Optimization

nonsmooth term is applied to the gradient descent step for the smooth term. In practice, it is relevant to consider situations where these operations cannot be performed exactly. For example, the case where the proximal operator is known only up-to an error have been considered see [43,50] and references therein. The other example, of interest for this paper, is when only stochastic estimates of the gradients are available. This setting is relevant in large scale statistical learning [10] and more generally in stochastic approximation [27]. The study of stochastic gradient methods goes back to [26,39] and this algorithm has been studied extensively for smooth, strongly convex objective functions [5,19,20,27,34,36]. In this context, a classic idea is averaging the iterates to allow for longer step-sizes in [35,37]. Stochastic gradient methods have also been extensively studied in the context of online learning [11]. Here, the analysis of the so called regret allows to derive convergence rates for the average iterates through a proofing technique known as online-to-batch conversion, see e.g. [44] and references therein. F

Data Loading...

Convergence of Stochastic Proximal Gradient Algorithm

Recommend Documents

Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization pro

An Improved Privacy-Preserving Stochastic Gradient Descent Algorithm

A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods

Convergence of a relaxed inertial proximal algorithm for maximally monotone operators

Convergence of a Two-Stage Proximal Algorithm for the Equilibrium Problem in Hadamard Spaces

On the Convergence Rate of a Proximal Point Algorithm for Vector Function on Hadamard Manifolds

Superiorization methodology and perturbation resilience of inertial proximal gradient algorithm with application to sign

On the strong convergence of the proximal point algorithm with an application to Hammerstein euations

RSM-Based Stochastic Gradient Procedures

Weak approximation of transformed stochastic gradient MCMC