Ada-boundary: accelerating DNN training via adaptive boundary batch selection

PDF / 2,529,055 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
19 Downloads / 305 Views

Ada‑boundary: accelerating DNN training via adaptive boundary batch selection Hwanjun Song1 · Sundong Kim2 · Minseok Kim1 · Jae‑Gil Lee1 Received: 3 December 2019 / Revised: 21 June 2020 / Accepted: 11 August 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Neural networks converge faster with help from a smart batch selection strategy. In this regard, we propose Ada-Boundary, a novel and simple adaptive batch selection algorithm that constructs an effective mini-batch according to the learning progress of the model. Our key idea is to exploit confusing samples for which the model cannot predict labels with high confidence. Thus, samples near the current decision boundary are considered to be the most effective for expediting convergence. Taking advantage of this design, Ada-Boundary maintained its dominance for various degrees of training difficulty. We demonstrate the advantage of Ada-Boundary by extensive experimentation using CNNs with five benchmark data sets. Ada-Boundary was shown to produce a relative improvement in test errors by up to 31.80% compared with the baseline for a fixed wall-clock training time, thereby achieving a faster convergence speed. Keywords Batch selection · Acceleration · Convergence · Decision boundary

1 Introduction Deep neural networks (DNNs) have achieved remarkable performance in many fields, especially, in computer vision and natural language processing (Goodfellow et al. 2016). Nevertheless, as the size of data set grows, the training step via stochastic gradient descent (SGD) based on mini-batches suffers from extremely high computational cost, which is Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Jae‑Gil Lee [email protected] Hwanjun Song [email protected] Sundong Kim [email protected] Minseok Kim [email protected] 1

Graduate School of Knowledge Service Engineering, KAIST, Daejeon, Korea

2

Institute for Basic Science, Daejeon, Korea

13

Vol.:(0123456789)

Machine Learning

Probability

Easy case (MNIST) Moderately hard

Hard

Easy

Hard batch

+

Hard

Hard case (CIFAR-10) Probability

+

-

Easy

Too hard

Hard

Easy

(a) Difficulty distribution.

+

-

Decision boundary

+

-

SGD on a hard batch

(b) Hard sample oriented training.

Fig. 1 Analysis on hard batch selection strategy: a shows the true sample distribution according to the difficulty computed by Eq. (1) at the training accuracy of 60% . An easy data set (MNIST) does not have “too hard” samples but “moderately hard” samples colored in gray, whereas a relatively hard data set (CIFAR10) has many “too hard” samples colored in black. b Shows the result of SGD on a hard batch. The moderately hard samples are informative to update a model, but the too hard samples make the model overfit to themselves

mainly due to slow convergence. The common approaches for expediting convergence include SGD variants (Zeiler 2012; Kingma and Ba 2015) that maintain individual learning rates for parame

Data Loading...

Ada-boundary: accelerating DNN training via adaptive boundary batch selection

Recommend Documents

Batch Bayesian optimization via adaptive local search

Batch mode active learning via adaptive criteria weights

Scale-Adaptive Forest Training via an Efficient Feature Sampling Scheme

Accelerating ELM training over data streams

Accelerating CNN Training by Pruning Activation Gradients

Selection, Training and Evaluation

Accelerating the Training of an LP-SVR Over Large Datasets

Model selection via information criteria

Model Selection for Support Vector Machines via Adaptive Step-Size Tabu Search

Adaptive Agents for Fit-for-Purpose Training

Adaptive Boundary Element Methods (Invited contribution)

Adaptive Context Selection for Polyp Segmentation