Applying FISTA to optimization problems (with or) without minimizers

  • PDF / 562,749 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 107 Downloads / 220 Views

DOWNLOAD

REPORT


Series A

Applying FISTA to optimization problems (with or) without minimizers Heinz H. Bauschke1

· Minh N. Bui2 · Xianfu Wang1

Received: 22 November 2018 / Accepted: 9 July 2019 © Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Abstract Beck and Teboulle’s FISTA method for finding a minimizer of the sum of two convex functions, one of which has a Lipschitz continuous gradient whereas the other may be nonsmooth, is arguably the most important optimization algorithm of the past decade. While research activity on FISTA has exploded ever since, the mathematically challenging case when the original optimization problem has no minimizer has found only limited attention. In this work, we systematically study FISTA and its variants. We present general results that are applicable, regardless of the existence of minimizers. Keywords Convex function · FISTA · Forward-backward method · Nesterov acceleration · Proximal gradient algorithm Mathematics Subject Classification Primary 90C25 · 65K05; Secondary 49M27

1 Introduction We assume that H is a real Hilbert space

(1)

with inner product ·|· and associated norm  · . We also presuppose throughout the paper that f : H → R and g : H → ]−∞, +∞] (2)

B

Heinz H. Bauschke [email protected] Minh N. Bui [email protected] Xianfu Wang [email protected]

1

Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada

2

Department of Mathematics, North Carolina State University, Raleigh, NC 27695-8205, USA

123

H. H. Bauschke et al.

satisfy the following: Assumption 1.1 (A1) f is convex and Fréchet differentiable on H, and ∇ f is β-Lipschitz continuous with β ∈]0, +∞[; (A2) g is convex, lower semicontinuous, and proper; (A3) γ ∈]0, 1/β] is a parameter. One fundamental problem in optimization is to minimize f + g over H.

(3)

  h := f + g and T := Proxγ g ◦ Id − γ ∇ f ,

(4)

For convenience, we set

and where we follow standard notation in convex analysis (as employed, e.g., in [8]). Then many algorithms designed for solving (3) employ the forward-backward or proximal gradient operator T in some fashion. Since the advent of Nesterov’s acceleration [22] (when g ≡ 0) and Beck and Teboulle’s fast proximal gradient method FISTA [11] (see also [9, Chapter 10]), the literature on algorithms for solving (3) has literally exploded; see, e.g., [1–3,5,7,11,18,22] for a selection of key contributions. Indeed, out of nearly one million mathematical publications that appeared since 2009 and are indexed by Mathematical Reviews, the 2009-FISTA paper [11] by Beck and Teboulle takes the number two spot! (In passing, we note that it has been cited more than 6,000 times on Google Scholar where it now receives about 3 new citations every day!) The overwhelming majority of these papers assume that the problem (3) has a solution to start with. Complementing and contributing to these analyses, we follow a path less trodden: The aim of this paper is to study the behaviour of the fast proximal gradient methods (and monotone variants