Deep Networks with Stochastic Depth

Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very deep networks come

PDF / 565,390 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
67 Downloads / 231 Views

DOWNLOAD

REPORT

Cornell University, Ithaca, USA {gh349,ys646,dms422,kqw4}@cornell.edu 2 Tsinghua University, Beijing, China [email protected]

Abstract. Very deep convolutional networks with hundreds of layers have led to signiﬁcant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very deep networks comes with its own set of challenges. The gradients can vanish, the forward ﬂow often diminishes, and the training time can be painfully slow. To address these problems, we propose stochastic depth, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time. We start with very deep networks but during training, for each mini-batch, randomly drop a subset of layers and bypass them with the identity function. This simple approach complements the recent success of residual networks. It reduces training time substantially and improves the test error signiﬁcantly on almost all data sets that we used for evaluation. With stochastic depth we can increase the depth of residual networks even beyond 1200 layers and still yield meaningful improvements in test error (4.91 % on CIFAR-10).

1

Introduction

Convolutional Neural Networks (CNNs) were arguably popularized within the vision community in 2009 through AlexNet [1] and its celebrated victory at the ImageNet competition [2]. Since then there has been a notable shift towards CNNs in many areas of computer vision [3–8]. As this shift unfolds, a second trend emerges; deeper and deeper CNN architectures are being developed and trained. Whereas AlexNet had 5 convolutional layers [1], the VGG network and GoogLeNet in 2014 had 19 and 22 layers respectively [5,7], and most recently the ResNet architecture featured 152 layers [8]. Network depth is a major determinant of model expressiveness, both in theory [9,10] and in practice [5,7,8]. However, very deep models also introduce new challenges: vanishing gradients in backward propagation, diminishing feature reuse in forward propagation, and long training time.

G. Huang and Y. Sun are contributed equally. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 646–661, 2016. DOI: 10.1007/978-3-319-46493-0 39

Deep Networks with Stochastic Depth

647

Vanishing Gradients is a well known nuisance in neural networks with many layers [11]. As the gradient information is back-propagated, repeated multiplication or convolution with small weights renders the gradient information ineffectively small in earlier layers. Several approaches exist to reduce this eﬀect in practice, for example through careful initialization [12], hidden layer supervision [13], or, recently, Batch Normalization [14]. Diminishing feature reuse during forward propagation (also known as loss in information ﬂow [15]) refers to the analogous problem to vanishing gradients in the forward direction. The features of the input instance, or those computed by e

Data Loading...

Deep Networks with Stochastic Depth

Recommend Documents

Image Spam Classification with Deep Neural Networks

Person Name Segmentation with Deep Neural Networks

Deep Multi Depth Panoramas for View Synthesis

StochNetV2: A Tool for Automated Deep Abstractions for Stochastic Reaction Networks

Random Cellular Networks and Stochastic Geometry

Global synchronization of stochastic delayed complex networks

Random Walks on Stochastic Temporal Networks

Identification of sunflower seeds with deep convolutional neural networks

COVID-19 X-ray Image Diagnostic with Deep Neural Networks

Deep Learning Classifiers with Memristive Networks Theory and Applic

Applied Neural Networks with TensorFlow 2 API Oriented Deep Lear

MATLAB Deep Learning With Machine Learning, Neural Networks and Arti