On the Linear Convergence of the Approximate Proximal Splitting Method for Non-smooth Convex Optimization

  • PDF / 389,521 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 34 Downloads / 198 Views

DOWNLOAD

REPORT


On the Linear Convergence of the Approximate Proximal Splitting Method for Non-smooth Convex Optimization Mojtaba Kadkhodaie • Maziar Sanjabi Zhi-Quan Luo



Received: 16 April 2014 / Revised: 23 May 2014 / Accepted: 28 May 2014  Operations Research Society of China, Periodicals Agency of Shanghai University, and SpringerVerlag Berlin Heidelberg 2014

Abstract Consider the problem of minimizing the sum of two convex functions, one being smooth and the other non-smooth. In this paper, we introduce a general class of approximate proximal splitting (APS) methods for solving such minimization problems. Methods in the APS class include many well-known algorithms such as the proximal splitting method, the block coordinate descent method (BCD), and the approximate gradient projection methods for smooth convex optimization. We establish the linear convergence of APS methods under a local error bound assumption. Since the latter is known to hold for compressive sensing and sparse group LASSO problems, our analysis implies the linear convergence of the BCD method for these problems without strong convexity assumption. Keywords Convex optimization  Proximal splitting method  Block coordinate descent method  Convergence rate analysis  Local error bound

1 Introduction Consider a constrained convex minimization of the form min x2X

FðxÞ ¼ f1 ðxÞ þ f2 ðxÞ;

ð1:1Þ

where X is a convex closed set (can be Rn ), f1 is a convex function (may be nonsmooth), and f2 is a smooth convex function with Lipschitz continuous gradient.

M. Kadkhodaie (&)  M. Sanjabi  Z.-Q. Luo University of Minnesota, Minneapolis, MN, USA e-mail: [email protected]

123

M. Kadkhodaie et al.

1.1 Motivating Applications Non-smooth convex optimization problems of the form (1.1) arise in many contemporary statistical and signal processing applications including compressive sensing, signal denoising, and sparse logistic regression. In the sequel, we outline some of the most recent applications of problem (1.1). 1.1.1 LASSO Problem Suppose that we have a noisy observation vector b 2 Rm about an unknown sparse vector x 2 Rn , where the signal model is linear and given by b  Ax; for some given matrix A 2 Rmn . One of the most popular techniques to estimate the sparse vector x is called LASSO [29]. LASSO can be viewed as an ‘1 -norm regularized linear least squares problem 1 minn kAx  bk2 þ kkxk1 ; x2R 2

ð1:2Þ

where the first term 12 kAx  bk2 reduces the estimation error, and the second term kkxk1 promotes the sparsity of the solution. The parameter k controls the sparsity level of the solution. The higher k is, the fewer non-zero entries would be in the LASSO solution. Clearly, by setting f2 ðxÞ ¼ 12 kAx  bk2 , f1 ðxÞ ¼ kkxk1 and X ¼ Rn , problem (1.2) becomes a special case of problem (1.1). 1.1.2 Group LASSO Problem In many applications, such as image denoising, the desired solution should have the so called group sparse structure [35], i.e., the solution x should admit a group P 0 separable structure x ¼ ½x1 ;    ; xK  , where xi 2 Rni and Ki¼1