On the interplay between acceleration and identification for the proximal gradient algorithm

  • PDF / 3,034,201 Bytes
  • 28 Pages / 439.37 x 666.142 pts Page_size
  • 28 Downloads / 159 Views

DOWNLOAD

REPORT


On the interplay between acceleration and identification for the proximal gradient algorithm Gilles Bareilles1 · Franck Iutzeler1  Received: 19 September 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper, we study the interplay between acceleration and structure identification for the proximal gradient algorithm. While acceleration is generally beneficial in terms of functional decrease, we report and analyze several cases where its interplay with identification has negative effects on the algorithm behavior (iterates oscillation, loss of structure, etc.). Then, we present a generic method that tames acceleration when structure identification may be at stake; it benefits from a convergence rate that matches the one of the accelerated proximal gradient under some qualifying condition. We show empirically that the proposed method is much more stable in terms of subspace identification compared to the accelerated proximal gradient method while keeping a similar functional decrease. Keywords  Proximal gradient · Accelerated methods · Identification

1 Introduction In this paper, we consider composite optimization problems of the form

min F(x) ∶= f (x) + g(x)

(1.1)

x∈ℝn

where f and g are convex functions. We are interested in the case where f is smooth (i.e. differentiable with a Lipschitz-continuous gradient) while g is nonsmooth and enforces some structure to the solutions of (1.1) (e.g. belonging to some set, sparsity, etc.). This type of problem often appears in signal processing and machine learning in which case the goal is usually to find a point with a low error in the task at hand while maintaining some structure. For instance, taking g as the 𝓁1 norm promotes * Franck Iutzeler Franck.Iutzeler@univ‑grenoble‑alpes.fr Gilles Bareilles Gilles.Bareilles@univ‑grenoble‑alpes.fr 1



Laboratoire Jean Kuntzmann, Univ. Grenoble Alpes, Saint‑Martin‑d’Hères, France

13

Vol.:(0123456789)



G. Bareilles, F. Iutzeler

sparsity, it is commonly used in the recovery of sparse signals or compressed sensing [13]; we refer the reader to [40] for an overview. Problem  (1.1) is typically solved by the proximal gradient algorithm, a special case of Forward-Backward splitting, which alternates a gradient step on the smooth function f and a proximal operator on the nonsmooth function g; see e.g. [4, Chap.  27]. Moreover, in order to improve the convergence of this algorithm, accelerated (also called fast or inertial) versions have been widely promoted, notably thanks to the popularity of FISTA [7]. These modifications consist in producing the next iterate by linearly combining the previous outputs of the proximal operator. Using combination coefficients as per Nesterov’s fast method [31] (or similar ones [3, 15]) improves practically and theoretically the rate of convergence of the proximal gradient (nonetheless to the price of a more involved analysis, see e.g. [1, 15, 30]). Additionally, the handling of the nonsmooth function g by a proximity operator enforces some structure on