Provable accelerated gradient method for nonconvex low rank optimization
- PDF / 903,460 Bytes
- 32 Pages / 439.37 x 666.142 pts Page_size
- 68 Downloads / 216 Views
Provable accelerated gradient method for nonconvex low rank optimization Huan Li1
· Zhouchen Lin2
Received: 28 October 2017 / Revised: 11 June 2019 / Accepted: 13 June 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019
Abstract Optimization over low rank matrices has broad applications in machine learning. For largescale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex problem minU∈Rn×r g(U) = f (UUT ) under the assumptions that f (X) is restricted μ-strongly convex and L-smooth on the set {X : X 0, rank(X) ≤ r }. We propose an accelerated gradient method with alternating constraint that operates directly on the U factors and show that the method √ has local linear convergence rate with the optimal dependence on the condition number of L/μ. Globally, our method converges to the critical point with zero gradient from any initializer. Our method also applies to the problem with the asymmetric factorization of X = U VT and the same convergence result can be obtained. Extensive experimental results verify the advantage of our method.
1 Introduction Low rank matrix estimation has broad applications in machine learning, computer vision and signal processing. In this paper, we consider the problem of the form: min f (X), s.t. X 0,
(1)
X∈Rn×n
where there exists minimizer X∗ of rank-r . We consider the case of r n. Optimizing problem (1) in the X space often requires computing at least the top-r singular value/vectors in each iteration and O(n 2 ) memory to store a large n by n matrix, which restricts the
Editor: Tijl De Bie. This work was done when Huan Li was a Ph.D. student at Peking University.
B
Zhouchen Lin [email protected] Huan Li [email protected]
1
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
2
Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, Beijing, China
123
Machine Learning
applications with huge size matrices. To reduce the computational cost as well as the storage space, many literatures exploit the observation that a positive semidefinite low rank matrix can be factorized as a product of two much smaller matrices, i.e., X = UUT , and study the following nonconvex problem instead: min g(U) = f (UUT ).
U∈Rn×r
(2)
A wide family of problems can be cast as problem (2), including matrix sensing (Bhojanapalli et al. 2016b), matrix completion (Jain et al. 2013), one bit matrix completion (Davenport et al. 2014), sparse principle component analysis (Cai et al. 2013) and factorization machine (Lin and Ye 2016). In this paper, we study problem (2) and aim to propose an accelerated gradient method that operates on the U factors directly. The factorization in problem (2) makes g(U) nonconvex, even if f (X) is convex. Thus, proving the acceleration becomes a harder task than the analysis for convex programming.
1.1 Related work Recent
Data Loading...