Accelerated Gradient Sliding for Minimizing a Sum of Functions

  • PDF / 232,996 Bytes
  • 3 Pages / 612 x 792 pts (letter) Page_size
  • 87 Downloads / 179 Views

DOWNLOAD

REPORT


UTER SCIENCE

Accelerated Gradient Sliding for Minimizing a Sum of Functions D. M. Dvinskikha, S. S. Omelchenkob,*, A. V. Gasnikova,**, and A. I. Tyurinc Presented by Academician of the RAS Yu.G. Evtushenko March 10, 2020 Received March 20, 2020; revised March 26, 2020; accepted April 3, 2020

Abstract—We propose a new way of justifying the accelerated gradient sliding of G. Lan, which allows one to extend the sliding technique to a combination of an accelerated gradient method with an accelerated variance reduction method. New optimal estimates for the solution of the problem of minimizing a sum of smooth strongly convex functions with a smooth regularizer are obtained. Keywords: accelerated gradient sliding of G. Lan, accelerated variance reduction methods, smooth strongly convex functions DOI: 10.1134/S1064562420030084

Many problems in data analysis (machine learning) lead to minimizing a function in the form of a sum (empirical risk) of numerous terms corresponding to the sample size [1, 4, 14, 15]. Numerical methods for optimizing sum-type functions have been extensively developed in the last decade [1, 4, 6, 9]. Specifically, optimal methods (accelerated variance reduction methods) for problems of this class were obtained in the case of a sum of smooth (strongly) convex functions (see, e.g., [9]). Problems were studied in which the minimization function involves an additional additive possibly nonsmooth, but convex (strongly convex) composite term (a term responsible for “regularization” in terms of data analysis) and this term is proximal-friendly [1, 10], i.e., the minimization of such a term with a quadratic addition is a simple problem. In this paper, we propose a method for obtaining optimal estimates in the case when the composite term is convex (strongly convex) and smooth, but not proximal-friendly. The terms in the sum are also not assumed to be proximal-friendly. In Section 1, Lan’s accelerated gradient sliding [9, Section 8.2] is explained using the now popular cona Kharkevich Institute for Information Transmission Problems,

Russian Academy of Sciences, Moscow, Russia Institute of Physics and Technology (National Research University), Dolgoprudnyi, Moscow oblast, 141701 Russia c National Research University Higher School of Economics, Moscow, 101000 Russia *e-mail: [email protected] **e-mail: [email protected] b Moscow

struction of catalyst [1, 11]. The found method is employed to extend the sliding technique to the class of problems under study. In Section 2, the results are generalized to various nonsmooth problem formulations, including generalized linear models [15] and other models admitting effective smoothing [2, 13]. 1. MAIN RESULTS Consider the problem m



F ( x) = f ( x) + g( x) = f ( x ) + 1 g k ( x ) → min, (1) x m k =1 where f and gk have Lf- and Lg-Lipschitz continuous gradients in the 2-norm, F is a μ -strongly convex function in the 2-norm, and μ ! L f . Problem (1) involves the additional condition m ≤ Lg /μ . Lan’s result [9, Section 8.2] is that this problem can