Regularisation of neural networks by enforcing Lipschitz continuity

PDF / 1,311,654 Bytes
24 Pages / 439.37 x 666.142 pts Page_size
31 Downloads / 167 Views

Regularisation of neural networks by enforcing Lipschitz continuity Henry Gouk1 · Eibe Frank2 · Bernhard Pfahringer2 · Michael J. Cree2 Received: 20 December 2019 / Revised: 12 October 2020 / Accepted: 25 October 2020 © The Author(s) 2020

Abstract We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs. To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-norms—of a feed forward neural network composed of commonly used layer types. Our technique is then used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods. Our evaluation study shows that the performance of the resulting models exceeds that of models trained with other common regularisers. We also provide evidence that the hyperparameters are intuitive to tune, demonstrate how the choice of norm for computing the Lipschitz constant impacts the resulting model, and show that the performance gains provided by our method are particularly noticeable when only a small amount of training data is available. Keywords Neural networks · Regularisation · Lipschitz continuity

1 Introduction Supervised learning is primarily concerned with the problem of approximating a function given examples of what output should be produced for a particular input. For the approximation to be of any practical use, it must generalise to unseen data points. Thus, we need to select an appropriate space of functions in which the machine should search Editor: Paolo Frasconi. * Henry Gouk [email protected] Eibe Frank [email protected] Bernhard Pfahringer [email protected] Michael J. Cree [email protected] 1

University of Edinburgh, Edinburgh, Scotland

2

University of Waikato, Hamilton, New Zealand

13

Vol.:(0123456789)

Machine Learning

for a good approximation, and select an algorithm to search through this space. This is typically done by first picking a large family of models, such as support vector machines or decision trees, and applying a suitable search algorithm. Crucially, when performing the search, regularisation techniques specific to the chosen model family must be employed to combat overfitting. For example, one could limit the depth of decision trees considered by a learning algorithm, or impose probabilistic priors on tunable model parameters. Regularisation of neural network models is a particularly difficult challenge. The methods that are currently most effective (Srivastava et al. 2014; Ioffe and Szegedy 2015) are heuristically motivated. In contrast, well-understood regularisation approaches adapted from linear models, such as applying an 𝓁 2 penalty term to the model parameters, are known to be less effective than the heuristic approaches (Srivastava et al. 2014). This provides a clear motivation for developing well-founded and effective regularisation methods for neural networks

Data Loading...

Regularisation of neural networks by enforcing Lipschitz continuity

Recommend Documents

Lipschitz Continuity and Approximate Equilibria

On Lipschitz Continuity of Projections onto Polyhedral Moving Sets

ReLEx: Regularisation for Linear Extrapolation in Neural Networks with Rectified Linear Units

Neural Networks, Secure by Construction

Identification of Chaotic Systems by Neural Networks

Intelligent Systems: Approximation by Artificial Neural Networks

Time Series Prediction by Reservoir Neural Networks

Transit Time Estimation by Artificial Neural Networks

Neural Networks

NEURAL NETWORKS

Neural Networks

Neural Networks Proceedings of the -School on Neural Networks - June