On the optimization landscape of tensor decompositions
- PDF / 634,660 Bytes
- 47 Pages / 439.37 x 666.142 pts Page_size
- 6 Downloads / 189 Views
Series B
On the optimization landscape of tensor decompositions Rong Ge1
· Tengyu Ma2
Received: 25 July 2019 / Accepted: 8 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2020
Abstract Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape of many objective functions in learning has been conjectured to have the geometric property that “all local optima are (approximately) global optima”, and thus they can be solved efficiently by local search algorithms. However, establishing such property can be very difficult. In this paper, we analyze the optimization landscape of the random over-complete tensor decomposition problem, which has many applications in unsupervised learning, especially in learning latent variable models. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. We show that for any small constant ε > 0, among the set of points with function values (1 + ε)-factor larger than the expectation of the function, all the local maxima are approximate global maxima. Previously, the best-known result only characterizes the geometry in small neighborhoods around the true components. Our result implies that even with an initialization that is barely better than the random guess, the gradient ascent algorithm is guaranteed to solve this problem. However, achieving such a initialization with random guess would still require super-polynomial number of attempts. Our main technique uses Kac–Rice formula and random matrix theory. To our best knowledge, this is the first time when Kac–Rice formula is successfully applied to counting the number of local optima of a highly-structured random polynomial with dependent coefficients.
A shorter version of this paper has appeared in NIPS 2017 [1]. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10107020-01579-x) contains supplementary material, which is available to authorized users.
B
Rong Ge [email protected] Tengyu Ma [email protected]
1
Computer Science Department, Duke University, Durham, USA
2
Computer Science and Statistics, Stanford University, Stanford, USA
123
R. Ge, T. Ma
Keywords Tensor decomposition · Optimization landscape · Kac–Rice formula Mathematics Subject Classification 90C26
1 Introduction Non-convex optimization is the dominating algorithmic technique behind many stateof-art results in machine learning, computer vision, natural language processing and reinforcement learning. Local search algorithms through stochastic gradient methods are simple, scalable and easy to implement. It has been conjectured [2,3] that on typical data, the landscape of many non-convex training objectives has the nice geometric property that all local minima are (approximate) global minima. Such property assures the local search
Data Loading...