Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization

PDF / 338,260 Bytes
11 Pages / 439.37 x 666.142 pts Page_size
32 Downloads / 249 Views

Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization Andrzej Ruszczynski ´ 1 Received: 28 August 2019 / Accepted: 9 January 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract We prove convergence of a single time-scale stochastic subgradient method with subgradient averaging for constrained problems with a nonsmooth and nonconvex objective function having the property of generalized differentiability. As a tool of our analysis, we also prove a chain rule on a path for such functions. Keywords Stochastic subgradient method · Nonsmooth optimization · Generalized differentiable functions · Chain rule

1 Introduction We consider the problem min f (x)

(1)

x∈X

where X ⊂ Rn is convex and closed, and f : Rn → R is a Lipschitz continuous function, which may be neither convex nor smooth. The subgradients of f (·) are not available; instead, we postulate access to their random estimates. Research on stochastic subgradient methods for nonsmooth and nonconvex functions started in late 1970’s. Early contributions are due to Nurminski, who considered weakly convex functions and established a general methodology for studying convergence of non-monotonic methods [20], Gupal and his co-authors, who considered convolution smoothing (mollification) of Lipschitz functions and resulting finitedifference methods [11], and Norkin, who considered unconstrained problems with “generalized differentiable” functions [17, Ch. 3 and 7]. Recently, by an approach

B 1

Andrzej Ruszczy´nski [email protected] Department of Management Science and Information Systems, Rutgers University, Piscataway, NJ 08854, USA

123

A. Ruszczynski ´

via differential inclusions, Duchi and Ruan [10] studied proximal methods for sumcomposite problems with weakly convex functions, Davis et al. [8] proved convergence of the subgradient method for locally Lipschitz Whitney stratifiable functions with constraints, and Majewski et al. [15] studied several methods for subdifferentially regular Lipschitz functions. Our objective is to show that a single time-scale stochastic subgradient method with direction averaging [21,22], is convergent for a broad class of functions enjoying the property of “generalized differentiability,” which contains all classes of functions mentioned above, as well as their compositions. Our analysis follows the approach of relating a stochastic approximation algorithm to a continuous-time dynamical system, pioneered in [13,14] and developed in many works (see, e.g., [12] and the references therein). Extension to multifunctions was proposed in [1] and further developed, among others, in [3,8,10,15]. For the purpose of our analysis, we also prove a chain rule on a path under generalized differentiability, which may be of independent interest. Finally, we illustrate the use of the method for training a ReLu neural network.

2 The chain formula on a path Norkin [19] introduced the following class of functions. Definition 1 A function f : Rn → R is differentiable in

Data Loading...

Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization

Recommend Documents

Superlinear Convergence of the Sequential Quadratic Method in Constrained Optimization

Nonconvex constrained optimization by a filtering branch and bound

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Robust constrained best approximation with nonconvex constraints

A bundle method for nonsmooth DC programming with application to chance-constrained problems

Nondifferentiable Optimization: Subgradient Optimization Methods

SSC Minimization Algorithms for Nonsmooth and Stochastic Optimization

The Linearization Method for Constrained Optimization

A compromise method in constrained optimization problems

An inexact multiple proximal bundle algorithm for nonsmooth nonconvex multiobjective optimization problems

Provable accelerated gradient method for nonconvex low rank optimization

A Direct Method for Parabolic PDE Constrained Optimization Problems