Regularisation of neural networks by enforcing Lipschitz continuity

  • PDF / 1,311,654 Bytes
  • 24 Pages / 439.37 x 666.142 pts Page_size
  • 31 Downloads / 167 Views

DOWNLOAD

REPORT


Regularisation of neural networks by enforcing Lipschitz continuity Henry Gouk1   · Eibe Frank2 · Bernhard Pfahringer2 · Michael J. Cree2 Received: 20 December 2019 / Revised: 12 October 2020 / Accepted: 25 October 2020 © The Author(s) 2020

Abstract We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs. To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-norms—of a feed forward neural network composed of commonly used layer types. Our technique is then used to formulate training a neural network with a bounded Lipschitz constant as a constrained optimisation problem that can be solved using projected stochastic gradient methods. Our evaluation study shows that the performance of the resulting models exceeds that of models trained with other common regularisers. We also provide evidence that the hyperparameters are intuitive to tune, demonstrate how the choice of norm for computing the Lipschitz constant impacts the resulting model, and show that the performance gains provided by our method are particularly noticeable when only a small amount of training data is available. Keywords  Neural networks · Regularisation · Lipschitz continuity

1 Introduction Supervised learning is primarily concerned with the problem of approximating a function given examples of what output should be produced for a particular input. For the approximation to be of any practical use, it must generalise to unseen data points. Thus, we need to select an appropriate space of functions in which the machine should search Editor: Paolo Frasconi. * Henry Gouk [email protected] Eibe Frank [email protected] Bernhard Pfahringer [email protected] Michael J. Cree [email protected] 1

University of Edinburgh, Edinburgh, Scotland

2

University of Waikato, Hamilton, New Zealand



13

Vol.:(0123456789)



Machine Learning

for a good approximation, and select an algorithm to search through this space. This is typically done by first picking a large family of models, such as support vector machines or decision trees, and applying a suitable search algorithm. Crucially, when performing the search, regularisation techniques specific to the chosen model family must be employed to combat overfitting. For example, one could limit the depth of decision trees considered by a learning algorithm, or impose probabilistic priors on tunable model parameters. Regularisation of neural network models is a particularly difficult challenge. The methods that are currently most effective (Srivastava et  al. 2014; Ioffe and Szegedy 2015) are heuristically motivated. In contrast, well-understood regularisation approaches adapted from linear models, such as applying an 𝓁 2 penalty term to the model parameters, are known to be less effective than the heuristic approaches (Srivastava et  al. 2014). This provides a clear motivation for developing well-founded and effective regularisation methods for neural networks