Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization

  • PDF / 338,260 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 32 Downloads / 207 Views

DOWNLOAD

REPORT


Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization Andrzej Ruszczynski ´ 1 Received: 28 August 2019 / Accepted: 9 January 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract We prove convergence of a single time-scale stochastic subgradient method with subgradient averaging for constrained problems with a nonsmooth and nonconvex objective function having the property of generalized differentiability. As a tool of our analysis, we also prove a chain rule on a path for such functions. Keywords Stochastic subgradient method · Nonsmooth optimization · Generalized differentiable functions · Chain rule

1 Introduction We consider the problem min f (x)

(1)

x∈X

where X ⊂ Rn is convex and closed, and f : Rn → R is a Lipschitz continuous function, which may be neither convex nor smooth. The subgradients of f (·) are not available; instead, we postulate access to their random estimates. Research on stochastic subgradient methods for nonsmooth and nonconvex functions started in late 1970’s. Early contributions are due to Nurminski, who considered weakly convex functions and established a general methodology for studying convergence of non-monotonic methods [20], Gupal and his co-authors, who considered convolution smoothing (mollification) of Lipschitz functions and resulting finitedifference methods [11], and Norkin, who considered unconstrained problems with “generalized differentiable” functions [17, Ch. 3 and 7]. Recently, by an approach

B 1

Andrzej Ruszczy´nski [email protected] Department of Management Science and Information Systems, Rutgers University, Piscataway, NJ 08854, USA

123

A. Ruszczynski ´

via differential inclusions, Duchi and Ruan [10] studied proximal methods for sumcomposite problems with weakly convex functions, Davis et al. [8] proved convergence of the subgradient method for locally Lipschitz Whitney stratifiable functions with constraints, and Majewski et al. [15] studied several methods for subdifferentially regular Lipschitz functions. Our objective is to show that a single time-scale stochastic subgradient method with direction averaging [21,22], is convergent for a broad class of functions enjoying the property of “generalized differentiability,” which contains all classes of functions mentioned above, as well as their compositions. Our analysis follows the approach of relating a stochastic approximation algorithm to a continuous-time dynamical system, pioneered in [13,14] and developed in many works (see, e.g., [12] and the references therein). Extension to multifunctions was proposed in [1] and further developed, among others, in [3,8,10,15]. For the purpose of our analysis, we also prove a chain rule on a path under generalized differentiability, which may be of independent interest. Finally, we illustrate the use of the method for training a ReLu neural network.

2 The chain formula on a path Norkin [19] introduced the following class of functions. Definition 1 A function f : Rn → R is differentiable in