A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods

PDF / 402,704 Bytes
23 Pages / 439.37 x 666.142 pts Page_size
79 Downloads / 204 Views

A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods Jian Gu1 · Xian-Tao Xiao2 Received: 30 October 2018 / Revised: 11 October 2019 / Accepted: 15 October 2019 © Operations Research Society of China, Periodicals Agency of Shanghai University, Science Press, and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract In this paper, we establish a unified framework to study the almost sure global convergence and the expected convergence rates of a class of mini-batch stochastic (projected) gradient (SG) methods, including two popular types of SG: stepsize diminished SG and batch size increased SG. We also show that the standard variance uniformly bounded assumption, which is frequently used in the literature to investigate the convergence of SG, is actually not required when the gradient of the objective function is Lipschitz continuous. Finally, we show that our framework can also be used for analyzing the convergence of a mini-batch stochastic extragradient method for stochastic variational inequality. Keywords Stochastic projected gradient method · Variance uniformly bounded · Convergence analysis Mathematics Subject Classification 62L20 · 90C25 · 90C15

This research was supported by the National Natural Science Foundation of China (Nos. 11871135 and 11801054) and the Fundamental Research Funds for the Central Universities (No. DUT19K46).

B

Xian-Tao Xiao [email protected] Jian Gu [email protected]

1

School of Sciences, Dalian Ocean University, Dalian 116023, China

2

School of Mathematical Sciences, Dalian University of Technology, Dalian 116023, China

123

J. Gu, X.-T. Xiao

1 Introduction We consider the following stochastic optimization problem: min f (x) := E[F(x, ξ )],

(1.1)

x∈X

where X ⊆ Rn is a closed convex set, ξ is a random vector supported on set Ξ and F : X × Ξ → R. We shall assume that the expectation E[F(x, ξ )] is well defined and finite valued for every x ∈ X , and the solution set X ∗ is nonempty. In particular, if the random vector ξ is discretely distributed, i.e., Ξ = {ξ1 , · · · , ξ N }, problem (1.1) is reduced to the finite-sum minimization formulation as follows: min f (x) := x∈X

N 1 f i (x), N i=1

where f i (x) := F(x, ξi ), i = 1, · · · , N . Such formulation is also known as empirical risk minimization in statistical machine learning [1]. In this paper, we assume that the gradient of the expectation function f (x) cannot be efficiently evaluated due to high-dimensional data or that the distribution of ξ is not fully exploited. Instead, we can obtain a stochastic approximation G(x) to the gradient ∇ f (x) based on Monte Carlo sampling. In this setting, one of the fundamental methods for solving problem (1.1) is stochastic gradient (SG) method (also known as stochastic approximation algorithm), which is in the form of x k+1 = Π X [x k − αk G(x k )],

(1.2)

where αk is the stepsize and Π X (x) denotes the projection of x onto X . The origin of SG can be traced back to the paper by Robbins and Monro [2] in 1951. In recent deca

Data Loading...

A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods

Recommend Documents

Convergence of Stochastic Proximal Gradient Algorithm

Accelerating variance-reduced stochastic gradient methods

Convergence Analysis of Parametric Optimization Methods

Convergence Analysis of Control Optimization Methods

Hybrid Riemannian conjugate gradient methods with global convergence properties

Ritz-like values in steplength selections for stochastic gradient methods

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

Weak approximation of transformed stochastic gradient MCMC

On inexact projected gradient methods for solving variable vector optimization problems

A Modern Framework to Study Convergence

RSM-Based Stochastic Gradient Procedures

A comparative analysis of gradient boosting algorithms