A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods

  • PDF / 402,704 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 79 Downloads / 204 Views

DOWNLOAD

REPORT


A Framework of Convergence Analysis of Mini-batch Stochastic Projected Gradient Methods Jian Gu1 · Xian-Tao Xiao2 Received: 30 October 2018 / Revised: 11 October 2019 / Accepted: 15 October 2019 © Operations Research Society of China, Periodicals Agency of Shanghai University, Science Press, and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract In this paper, we establish a unified framework to study the almost sure global convergence and the expected convergence rates of a class of mini-batch stochastic (projected) gradient (SG) methods, including two popular types of SG: stepsize diminished SG and batch size increased SG. We also show that the standard variance uniformly bounded assumption, which is frequently used in the literature to investigate the convergence of SG, is actually not required when the gradient of the objective function is Lipschitz continuous. Finally, we show that our framework can also be used for analyzing the convergence of a mini-batch stochastic extragradient method for stochastic variational inequality. Keywords Stochastic projected gradient method · Variance uniformly bounded · Convergence analysis Mathematics Subject Classification 62L20 · 90C25 · 90C15

This research was supported by the National Natural Science Foundation of China (Nos. 11871135 and 11801054) and the Fundamental Research Funds for the Central Universities (No. DUT19K46).

B

Xian-Tao Xiao [email protected] Jian Gu [email protected]

1

School of Sciences, Dalian Ocean University, Dalian 116023, China

2

School of Mathematical Sciences, Dalian University of Technology, Dalian 116023, China

123

J. Gu, X.-T. Xiao

1 Introduction We consider the following stochastic optimization problem: min f (x) := E[F(x, ξ )],

(1.1)

x∈X

where X ⊆ Rn is a closed convex set, ξ is a random vector supported on set Ξ and F : X × Ξ → R. We shall assume that the expectation E[F(x, ξ )] is well defined and finite valued for every x ∈ X , and the solution set X ∗ is nonempty. In particular, if the random vector ξ is discretely distributed, i.e., Ξ = {ξ1 , · · · , ξ N }, problem (1.1) is reduced to the finite-sum minimization formulation as follows: min f (x) := x∈X

N 1  f i (x), N i=1

where f i (x) := F(x, ξi ), i = 1, · · · , N . Such formulation is also known as empirical risk minimization in statistical machine learning [1]. In this paper, we assume that the gradient of the expectation function f (x) cannot be efficiently evaluated due to high-dimensional data or that the distribution of ξ is not fully exploited. Instead, we can obtain a stochastic approximation G(x) to the gradient ∇ f (x) based on Monte Carlo sampling. In this setting, one of the fundamental methods for solving problem (1.1) is stochastic gradient (SG) method (also known as stochastic approximation algorithm), which is in the form of x k+1 = Π X [x k − αk G(x k )],

(1.2)

where αk is the stepsize and Π X (x) denotes the projection of x onto X . The origin of SG can be traced back to the paper by Robbins and Monro [2] in 1951. In recent deca