A linearly convergent stochastic recursive gradient method for convex optimization

PDF / 530,619 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
92 Downloads / 279 Views

A linearly convergent stochastic recursive gradient method for convex optimization Yan Liu1,2 · Xiao Wang1,2 · Tiande Guo1,2 Received: 4 April 2019 / Accepted: 8 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The stochastic recursive gradient algorithm (SARAH) (Nguyen et al. in: Neural information processing systems, pp 2613–2621, 2017) attracts much interest recently. It admits a simple recursive framework for updating stochastic gradient estimates. Motivated by this, in this paper we propose a new stochastic recursive gradient method, called SARAH-I. Different from SARAH, SARAH-I incorporates importance sampling strategy and computes the full gradient at the last iterate in each inner iteration. We show that the sequence of distances between iterates and the optima set is linearly convergent under both strong convexity and non-strong convexity conditions. Furthermore, we propose to use the Barzilai–Borwein (BB) approach to adaptively compute step sizes for SARAH-I, and name the resulting method as SARAH-I-BB. We establish its convergence and complexity properties in different cases. Finally numerical tests are reported to indicate promising performances of proposed algorithms. Keywords Stochastic optimization · Stochastic gradient · BB method · Linear convergence rate · Complexity

1 Introduction In the context of large scale machine learning, the following type of optimization problems is widely considered:

B

Xiao Wang [email protected] Yan Liu [email protected] Tiande Guo [email protected]

1

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

2

Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing 100049, China

123

Y. Liu et al.

min

w∈Rd

P(w) =

n 1 f i (w), n

(1)

i=1

where n is the sample size, and each f i , i ∈ {1, 2, . . . , n} is first-order continuously differentiable. The problem (1) is challenging when n is extremely large. Because the exact full gradient information is not easy to obtain, exact gradient-based methods are impractical and prohibited. Stochastic gradient descent (SGD) method, however, traced back to the seminal work by [1], has become the dominating approach for solving (1). In the tth iteration, SGD picks an index i ∈ [n] at random, and updates the iterate wt by wt+1 = wt − ηt ∇ f i (wt ), where ηt > 0 is the step size, and ∇ f i (wt ) denotes the sample gradient at wt . A surge of methods to improve the performance of SGD have been developed. One type of most prevalent methods are gradient aggregation algorithms [3], such as SAG [4,5] and SAGA [6]. They compute a stochastic gradient as an average of stochastic gradients evaluated at previous iterates. Then they store previous stochastic gradients at the expense of memory. SVRG [7] has two loops, with a full gradient computed in the outer loop (each outer iteration is called an epoch) and stochastic gradients with lower variance computed in the inner loop. S2GD [14] runs a ra

Data Loading...

A linearly convergent stochastic recursive gradient method for convex optimization

Recommend Documents

Convex and Stochastic Optimization

A Convex Optimization Based Method for Color Image Reconstruction

Variable Smoothing for Convex Optimization Problems Using Stochastic Gradients

Stochastic Distribution Control System Design A Convex Optimization

A Modified Descent Spectral Conjugate Gradient Method for Unconstrained Optimization

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Bi-fidelity stochastic gradient descent for structural optimization under uncertainty

Structural Optimization of Linearly Elastic Structures Using the Homogenization Method

An Iterative Convex Programming Method for Rocket Landing Trajectory Optimization

Near-Optimal Hyperfast Second-Order Method for Convex Optimization

Generalized Bounds for Convex Multistage Stochastic Programs

Gradient-Free Methods with Inexact Oracle for Convex-Concave Stochastic Saddle-Point Problem