Machine learning from a continuous viewpoint, I

  • PDF / 541,970 Bytes
  • 34 Pages / 612 x 792 pts (letter) Page_size
  • 18 Downloads / 231 Views

DOWNLOAD

REPORT


. ARTICLES .

https://doi.org/10.1007/s11425-020-1773-8

Machine learning from a continuous viewpoint, I Weinan E1,2,3,∗ , Chao Ma2 & Lei Wu2

2The

1Department of Mathematics, Princeton University, Princeton, NJ 08544, USA; Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ 08544, USA; 3Beijing Institute of Big Data Research, Beijing 100871, China

Email: [email protected], [email protected], [email protected] Received May 27, 2020; accepted August 28, 2020

Abstract

We present a continuous formulation of machine learning, as a problem in the calculus of variations

and differential-integral equations, in the spirit of classical numerical analysis. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the two-layer neural network model and the residual neural network model, can all be recovered (in a scaled form) as particular discretizations of different continuous formulations. We also present examples of new models, such as the flow-based random feature model, and new algorithms, such as the smoothed particle method and spectral method, that arise naturally from this continuous formulation. We discuss how the issues of generalization error and implicit regularization can be studied under this framework. Keywords MSC(2010)

machine learning, continuous formulation, flow-based model, gradient flow, particle approximation 41A99, 49M99

Citation: E W, Ma C, Wu L. Machine learning from a continuous viewpoint, I. Sci China Math, 2020, 63, https://doi.org/10.1007/s11425-020-1773-8

1

Introduction

We present a continuous formulation of machine learning. As usual this continuous formulation consists of three components: a representation of functions, a loss functional and a training dynamics. For representations of functions, we will discuss the integral transform-based models and the more advanced flow-based models. For the loss functional, we give examples that arise in supervised and unsupervised learning, as well as examples from calculus of variations and partial differential equations (PDEs). For training dynamics, we divide the unknown parameters into two classes: conserved and non-conserved. For non-conserved parameters, we use what is known in the physics literature as the model A dynamics [39], namely gradient flow in the usual L2 metric. For conserved parameters, we use what is known as the model B dynamics [39], namely the gradient flow in the Wasserstein metric [41]. In this framework, machine learning becomes a calculus of variations or PDE-like problem, and different numerical algorithms can be used to discretize these continuous models. In particular, two-layer neural network [6, 18] and deep residual neural network (ResNet) models [24, 36] can be recovered, in a scaled form, when the particle method is applied to particular versions of the integral transform-based and * Corresponding author c Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2020 ⃝

math.scichina.com

link.spring