Provable accelerated gradient method for nonconvex low rank optimization

PDF / 903,460 Bytes
32 Pages / 439.37 x 666.142 pts Page_size
68 Downloads / 219 Views

Provable accelerated gradient method for nonconvex low rank optimization Huan Li1

· Zhouchen Lin2

Received: 28 October 2017 / Revised: 11 June 2019 / Accepted: 13 June 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Abstract Optimization over low rank matrices has broad applications in machine learning. For largescale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex problem minU∈Rn×r g(U) = f (UUT ) under the assumptions that f (X) is restricted μ-strongly convex and L-smooth on the set {X : X 0, rank(X) ≤ r }. We propose an accelerated gradient method with alternating constraint that operates directly on the U factors and show that the method √ has local linear convergence rate with the optimal dependence on the condition number of L/μ. Globally, our method converges to the critical point with zero gradient from any initializer. Our method also applies to the problem with the asymmetric factorization of X = U VT and the same convergence result can be obtained. Extensive experimental results verify the advantage of our method.

1 Introduction Low rank matrix estimation has broad applications in machine learning, computer vision and signal processing. In this paper, we consider the problem of the form: min f (X), s.t. X 0,

(1)

X∈Rn×n

where there exists minimizer X∗ of rank-r . We consider the case of r n. Optimizing problem (1) in the X space often requires computing at least the top-r singular value/vectors in each iteration and O(n 2 ) memory to store a large n by n matrix, which restricts the

Editor: Tijl De Bie. This work was done when Huan Li was a Ph.D. student at Peking University.

B

Zhouchen Lin [email protected] Huan Li [email protected]

1

School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

2

Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, Beijing, China

123

Machine Learning

applications with huge size matrices. To reduce the computational cost as well as the storage space, many literatures exploit the observation that a positive semidefinite low rank matrix can be factorized as a product of two much smaller matrices, i.e., X = UUT , and study the following nonconvex problem instead: min g(U) = f (UUT ).

U∈Rn×r

(2)

A wide family of problems can be cast as problem (2), including matrix sensing (Bhojanapalli et al. 2016b), matrix completion (Jain et al. 2013), one bit matrix completion (Davenport et al. 2014), sparse principle component analysis (Cai et al. 2013) and factorization machine (Lin and Ye 2016). In this paper, we study problem (2) and aim to propose an accelerated gradient method that operates on the U factors directly. The factorization in problem (2) makes g(U) nonconvex, even if f (X) is convex. Thus, proving the acceleration becomes a harder task than the analysis for convex programming.

1.1 Related work Recent

Data Loading...

Provable accelerated gradient method for nonconvex low rank optimization

Recommend Documents

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Asynchronous parallel algorithms for nonconvex optimization

Smooth Nonlinear Nonconvex Optimization

Accelerated Diagonal Steepest Descent Method for Unconstrained Multiobjective Optimization

A Modified Descent Spectral Conjugate Gradient Method for Unconstrained Optimization

Two-step conjugate gradient method for unconstrained optimization

A linearly convergent stochastic recursive gradient method for convex optimization

Rank gradient versus stable integral simplicial volume

Low Rank Communication for Federated Learning

Duality Theory: Biduality in Nonconvex Optimization

Tensor Low-Rank Reconstruction for Semantic Segmentation

Nonconvex robust programming via value-function optimization