Less Is More: Towards Compact CNNs
To attain a favorable performance on large-scale datasets, convolutional neural networks (CNNs) are usually designed to have very high capacity involving millions of parameters. In this work, we aim at optimizing the number of neurons in a network, thus t
- PDF / 423,777 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 63 Downloads / 193 Views
University of Maryland, College Park, College Park, USA [email protected] 2 Data61/CSIRO, Eveleigh, Australia [email protected] 3 Australian National University, Canberra, Australia [email protected]
Abstract. To attain a favorable performance on large-scale datasets, convolutional neural networks (CNNs) are usually designed to have very high capacity involving millions of parameters. In this work, we aim at optimizing the number of neurons in a network, thus the number of parameters. We show that, by incorporating sparse constraints into the objective function, it is possible to decimate the number of neurons during the training stage. As a result, the number of parameters and the memory footprint of the neural network are also reduced, which is also desirable at the test time. We evaluated our method on several well-known CNN structures including AlexNet, and VGG over different datasets including ImageNet. Extensive experimental results demonstrate that our method leads to compact networks. Taking first fully connected layer as an example, our compact CNN contains only 30 % of the original neurons without any degradation of the top-1 classification accuracy. Keywords: Convolutional neural network · Neuron reduction · Sparsity
1
Introduction
Last few years have witnessed the success of deep convolutional neural networks (CNN) in many computer vision applications. One important reason is the emergence of large annotated datasets and the development of high-performance computing hardware facilitating the training of high capacity CNNs with an exceptionally large number of parameters. When defining the structure of a network, large networks are often preferred, and strong regularizers [36] tend to be applied to give the network as much discriminative power as possible. As such, the state-of-the-art CNNs nowadays contain hundreds of millions of parameters [34]. Most of these parameters come from one or two layers that host a large number of neurons. Take AlexNet [23] as an example. The first and the second fully connected layers, which have 4096 Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46493-0 40) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part IV, LNCS 9908, pp. 662–677, 2016. DOI: 10.1007/978-3-319-46493-0 40
Less Is More: Towards Compact CNNs
663
Table 1. Neuron reduction in the first fully connected layer, the total parameter compression, reduced memory, the top-1 validation error rate (red: error rate without sparse constraints). Network
Compression (%) Memory reduceda Top-1 error (%) b c Neurons Parameters
LeNet
97.80
92.00
1.52 (MB)
0.63 (0.72)
CIFAR-10 quick 73.44
33.42
0.19 (MB)
25.25 (24.75)
AlexNet
65.42
152.14 (MB)
46.10 (45.57)
73.73
VGG-13 76.21 61.29 311.06 (MB) 39.26 (37.50) a Supposing a single type is used to store the weight b Results on number of neurons in the first fully connected layer c Results on the
Data Loading...