Accelerated Gradient Sliding for Minimizing a Sum of Functions

PDF / 232,996 Bytes
3 Pages / 612 x 792 pts (letter) Page_size
87 Downloads / 204 Views

UTER SCIENCE

Accelerated Gradient Sliding for Minimizing a Sum of Functions D. M. Dvinskikha, S. S. Omelchenkob,*, A. V. Gasnikova,**, and A. I. Tyurinc Presented by Academician of the RAS Yu.G. Evtushenko March 10, 2020 Received March 20, 2020; revised March 26, 2020; accepted April 3, 2020

Abstract—We propose a new way of justifying the accelerated gradient sliding of G. Lan, which allows one to extend the sliding technique to a combination of an accelerated gradient method with an accelerated variance reduction method. New optimal estimates for the solution of the problem of minimizing a sum of smooth strongly convex functions with a smooth regularizer are obtained. Keywords: accelerated gradient sliding of G. Lan, accelerated variance reduction methods, smooth strongly convex functions DOI: 10.1134/S1064562420030084

Many problems in data analysis (machine learning) lead to minimizing a function in the form of a sum (empirical risk) of numerous terms corresponding to the sample size [1, 4, 14, 15]. Numerical methods for optimizing sum-type functions have been extensively developed in the last decade [1, 4, 6, 9]. Specifically, optimal methods (accelerated variance reduction methods) for problems of this class were obtained in the case of a sum of smooth (strongly) convex functions (see, e.g., [9]). Problems were studied in which the minimization function involves an additional additive possibly nonsmooth, but convex (strongly convex) composite term (a term responsible for “regularization” in terms of data analysis) and this term is proximal-friendly [1, 10], i.e., the minimization of such a term with a quadratic addition is a simple problem. In this paper, we propose a method for obtaining optimal estimates in the case when the composite term is convex (strongly convex) and smooth, but not proximal-friendly. The terms in the sum are also not assumed to be proximal-friendly. In Section 1, Lan’s accelerated gradient sliding [9, Section 8.2] is explained using the now popular cona Kharkevich Institute for Information Transmission Problems,

Russian Academy of Sciences, Moscow, Russia Institute of Physics and Technology (National Research University), Dolgoprudnyi, Moscow oblast, 141701 Russia c National Research University Higher School of Economics, Moscow, 101000 Russia *e-mail: [email protected] **e-mail: [email protected] b Moscow

struction of catalyst [1, 11]. The found method is employed to extend the sliding technique to the class of problems under study. In Section 2, the results are generalized to various nonsmooth problem formulations, including generalized linear models [15] and other models admitting effective smoothing [2, 13]. 1. MAIN RESULTS Consider the problem m



F ( x) = f ( x) + g( x) = f ( x ) + 1 g k ( x ) → min, (1) x m k =1 where f and gk have Lf- and Lg-Lipschitz continuous gradients in the 2-norm, F is a μ -strongly convex function in the 2-norm, and μ ! L f . Problem (1) involves the additional condition m ≤ Lg /μ . Lan’s result [9, Section 8.2] is that this problem can

Data Loading...

Accelerated Gradient Sliding for Minimizing a Sum of Functions

Recommend Documents

Minimizing a sum of clipped convex functions

Minimizing the difference of two quasiconvex functions

Minimizing Ratio of Monotone Non-submodular Functions

A new method based on the proximal bundle idea and gradient sampling technique for minimizing nonsmooth convex functions

Provable accelerated gradient method for nonconvex low rank optimization

Scheduling with Non-renewable Resources: Minimizing the Sum of Completion Times

Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality

Completely multiplicative functions with sum zero over generalised prime systems

Modelling Multivariate Ranking Functions with Min-Sum Networks

Sliding Mode Control based Tracking of Non-Differentiable Reference Functions

A New Look on the Ordinal Sum of Fuzzy Implication Functions

Index-Selection for Minimizing Costs of a NoSQL Cloud Database