Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

  • PDF / 1,060,861 Bytes
  • 18 Pages / 439.642 x 666.49 pts Page_size
  • 56 Downloads / 192 Views

DOWNLOAD

REPORT


Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications ´ ome ˆ Franck Iutzeler1 · Jer Malick2 Received: 7 April 2020 / Accepted: 30 September 2020 / © Springer Nature B.V. 2020

Abstract Nonsmoothness is often a curse for optimization; but it is sometimes a blessing, in particular for applications in machine learning. In this paper, we present the specific structure of nonsmooth optimization problems appearing in machine learning and illustrate how to leverage this structure in practice, for compression, acceleration, or dimension reduction. We pay a special attention to the presentation to make it concise and easily accessible, with both simple examples and general results. Keywords Nonsmooth optimization · Proximal algorithms · Identification · Machine learning applications Mathematics Subject Classification (2010) 49J52 · 49Q12 · 90C25 · 68W15

1 Introduction Optimization is at the core of machine learning and nonsmoothness plays there a special role. Often, nonsmooth functions are introduced in learning problems to enforce low-complexity of the optimal model parameters. This promoted structure (e.g. sparsity, low-rank, or controlled variation) turns out to be progressively identified by proximal algorithms; most interestingly, it can be leveraged numerically to improve optimization methods. This situation contrasts with a large part of the optimization literature where dealing with nonsmooth functions or checking nonsmooth substructure is usually uneasy. In this paper, we lay out a generic framework of optimization problems for which nonsmoothness can be exploited, covering most machine learning problems with low-complexity priors. We then go over the main technical tools that enable to mathematically and practically handle nonsmooth structures. We pay a special attention to being accessible for a

 Franck Iutzeler

[email protected] 1

University of Grenoble Alpes, Grenoble, France

2

CNRS, LJK, Grenoble, France

F. Iutzeler, J. Malick

wide audience1 in optimization, including graduate students in applied maths. No specific knowledge on machine learning or nonsmooth analysis is necessary to follow our developments, but fundamentals in mathematical optimization are required. Let us mention that we will make a constant (but basic) use of the proximal operator which is a key tool to deal with explicit nonsmoothness. For a function g : Rn → R ∪ {+∞} and a parameter γ > 0, we define proxγ g (u) for any u ∈ Rn as the optimal solution of the following problem:   1 proxγ g (u) := arg minn g(y) + (prox) y − u22 . y∈R 2γ This properly defines an operator proxγ g : Rn → Rn in many cases, in particular when g is convex. We refer to the textbooks [31] and [6] or to the review [43] for more information and an historical perspective. The rest of the paper is organized as follows. In Section 2, we explain why nonsmooth models are considered in machine learning. In Section 3, we formalize a large class of nonsmooth optimization problems for which structure can