LayerOut: Freezing Layers in Deep Neural Networks

  • PDF / 940,193 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 23 Downloads / 214 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

LayerOut: Freezing Layers in Deep Neural Networks Kelam Goutam1 · S. Balasubramanian1 · Darshan Gera1   · R. Raghunatha Sarma1 Received: 16 May 2020 / Accepted: 28 August 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract Deep networks involve a huge amount of computation during the training phase and are prone to over-fitting. To ameliorate these, several conventional techniques such as DropOut, DropConnect, Guided Dropout, Stochastic Depth, and BlockDrop have been proposed. These techniques regularize a neural network by dropping nodes, connections, layers, or blocks within the network. However, these conventional regularization techniques suffers from limitation that, they are suited either for fully connected networks or ResNet-based architectures. In this research, we propose a novel regularization technique LayerOut to train deep neural networks which stochastically freeze the trainable parameters of a layer during an epoch of training. This technique can be applied to both fully connected networks and all types of convolutional networks such as VGG-16, ResNet, etc. Experimental evaluation on multiple dataset including MNIST, CIFAR-10, and CIFAR-100 demonstrates that LayerOut generalizes better than the conventional regularization techniques and additionally reduces the computational burden significantly. We have observed up to 70% reduction in computation per epoch and up to 2 % improvement in classification accuracy as compared to the baseline networks (VGG-16 and ResNet-110) on above datasets. Codes are publically available at https​://githu​b.com/Gouta​m-Kelam​/Layer​Out. Keywords  LayerOut · DropOut · DropConnect · Guided Dropout · Stochastic Depth · BlockDrop

Introduction The recent trend in the deep learning community is to use deeper neural networks (DNN) [8, 27] to solve real-life problems such as image classification [17, 26], language translation [19, 29], object detection [5, 21], speech recognition [2, 6], etc. However, deeper neural networks have been empirically found to display the undesirable characteristic of being prone to over-fitting. Further, the computational load of training a deeper network is by no means trivial. This makes the deployment of deeper models in real-time environment * Kelam Goutam [email protected] S. Balasubramanian [email protected] Darshan Gera [email protected] R. Raghunatha Sarma [email protected] 1



Department of Mathematics and Computer Science, Sri Sathya Sai Institute of Higher Learning, Prashantinilayam, India

such as interactive applications on mobile devices, autonomous driving, etc. a challenging task. In the literature, there are multiple techniques to reduce over-fitting in DNNs such as data augmentation which increases the number of training samples; semi-supervised learning [10] which additionally uses large amount of unsupervised data to train the DNNs; transfer learning [4, 15] which additionally uses models pre-trained on large amount of supervised data. The regularization tech