Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

PDF / 1,060,861 Bytes
18 Pages / 439.642 x 666.49 pts Page_size
56 Downloads / 202 Views

Nonsmoothness in Machine Learning: Speciﬁc Structure, Proximal Identiﬁcation, and Applications ´ ome ˆ Franck Iutzeler1 · Jer Malick2 Received: 7 April 2020 / Accepted: 30 September 2020 / © Springer Nature B.V. 2020

Abstract Nonsmoothness is often a curse for optimization; but it is sometimes a blessing, in particular for applications in machine learning. In this paper, we present the specific structure of nonsmooth optimization problems appearing in machine learning and illustrate how to leverage this structure in practice, for compression, acceleration, or dimension reduction. We pay a special attention to the presentation to make it concise and easily accessible, with both simple examples and general results. Keywords Nonsmooth optimization · Proximal algorithms · Identification · Machine learning applications Mathematics Subject Classiﬁcation (2010) 49J52 · 49Q12 · 90C25 · 68W15

1 Introduction Optimization is at the core of machine learning and nonsmoothness plays there a special role. Often, nonsmooth functions are introduced in learning problems to enforce low-complexity of the optimal model parameters. This promoted structure (e.g. sparsity, low-rank, or controlled variation) turns out to be progressively identified by proximal algorithms; most interestingly, it can be leveraged numerically to improve optimization methods. This situation contrasts with a large part of the optimization literature where dealing with nonsmooth functions or checking nonsmooth substructure is usually uneasy. In this paper, we lay out a generic framework of optimization problems for which nonsmoothness can be exploited, covering most machine learning problems with low-complexity priors. We then go over the main technical tools that enable to mathematically and practically handle nonsmooth structures. We pay a special attention to being accessible for a

Franck Iutzeler

[email protected] 1

University of Grenoble Alpes, Grenoble, France

2

CNRS, LJK, Grenoble, France

F. Iutzeler, J. Malick

wide audience1 in optimization, including graduate students in applied maths. No specific knowledge on machine learning or nonsmooth analysis is necessary to follow our developments, but fundamentals in mathematical optimization are required. Let us mention that we will make a constant (but basic) use of the proximal operator which is a key tool to deal with explicit nonsmoothness. For a function g : Rn → R ∪ {+∞} and a parameter γ > 0, we define proxγ g (u) for any u ∈ Rn as the optimal solution of the following problem: 1 proxγ g (u) := arg minn g(y) + (prox) y − u22 . y∈R 2γ This properly defines an operator proxγ g : Rn → Rn in many cases, in particular when g is convex. We refer to the textbooks [31] and [6] or to the review [43] for more information and an historical perspective. The rest of the paper is organized as follows. In Section 2, we explain why nonsmooth models are considered in machine learning. In Section 3, we formalize a large class of nonsmooth optimization problems for which structure can

Data Loading...

Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

Recommend Documents

Ensembles in Machine Learning Applications

Applications of Machine Learning

Innovations in Machine Learning Theory and Applications

Biomarkers and Machine Learning Applications in Obesity

Machine Learning Paradigms Applications of Learning and Analytics in

Machine Learning Paradigms Applications in Recommender Systems

Cybernetics, Cognition and Machine Learning Applications Proceed

Machine Learning and Its Applications Advanced Lectures

Machine Learning Applications in Advanced Manufacturing Processes

Advanced Machine Learning Technologies and Applications Proceedings

Ensemble Machine Learning Methods and Applications

Implementations and Applications of Machine Learning