Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experimen

PDF / 769,615 Bytes
30 Pages / 439.37 x 666.142 pts Page_size
29 Downloads / 223 Views

Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments Tuyen Trung Truong1

· Hang-Tuan Nguyen2

© The Author(s) 2020

Abstract In this paper, we provide new results and algorithms (including backtracking versions of Nesterov accelerated gradient and Momentum) which are more applicable to large scale optimisation as in Deep Neural Networks. We also demonstrate that Backtracking Gradient Descent (Backtracking GD) can obtain good upper bound estimates for local Lipschitz constants for the gradient, and that the convergence rate of Backtracking GD is similar to that in classical work of Armijo. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify a heuristic argument that Backtracking GD stabilises to a finite union of sequences constructed from Standard GD for the mini-batch practice, and show that our new algorithms (while automatically fine tuning learning rates) perform better than current state-of-the-art methods such as Adam, Adagrad, Adadelta, RMSProp, Momentum and Nesterov accelerated gradient. To help readers avoiding the confusion between heuristics and more rigorously justified algorithms, we also provide a review of the current state of convergence results for gradient descent methods. Accompanying source codes are available on GitHub. Keywords Automation of learning rates · Backtracking · Deep neural networks · Random dynamical systems · Global convergence · Gradient descent · Image classification · Iterative optimisation · Large scale optimisation · Local minimum Mathematics Subject Classification 65Kxx · 68Txx · 49Mxx · 68Uxx

B

Tuyen Trung Truong [email protected] Hang-Tuan Nguyen [email protected]

1

Matematikk Institut, Universitetet i Oslo, Blindern 0851, Oslo, Norway

2

Axon AI Research, Seattle, WA, USA

123

Applied Mathematics & Optimization

1 Introduction In this section, we provide a non-technical overview of the important role and current practices of Gradient Descent methods (GD) in optimisation, in particular in large scale optimisation as in Deep Neural Networks (DNN), and some new features of our main results in this paper. One special feature of the modern society is the need of solving large scale optimisation problems quickly, stably, efficiently and reproducibly. One exemplar for this is the development of Deep Learning, which has obtained spectacular achievements recently. Among the most famous novelties, one can mention Alpha Go (the first computer program that ever won human players, and in fact the best human players, in the Game of Go) and new developments in self-driving cars. One important tool in Deep Learning is DNN. Roughly speaking, a DNN is a parametrised family f (x, θ ), a composition of very simple maps, aiming to approximate well a feature y. By choosing an explicit metric to measure the difference between the prediction by the DNN f (x, θ ) and the “ground truth” y over a training set I , one gets a specific value of θ as one global minimum of an associated opti

Data Loading...

Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experimen

Recommend Documents

Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent

Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization pro

Gradient descent algorithms for quantile regression with smooth approximation

A Modified Descent Spectral Conjugate Gradient Method for Unconstrained Optimization

Backtracking Based k-SAT Algorithms

Gradient Descent Method Based on the Multidimensional Voxel Images

Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality

Large-Scale PDE-Constrained Optimization in Applications

Solving Large Scale Learning Tasks. Challenges and Algorithms Es

Foraging-Inspired Optimisation Algorithms

Large-Scale Crew Scheduling Models, Methods, and Applications in the

Some Stochastic Gradient Algorithms for Hammerstein Systems with Piecewise Linearity