Deep network compression with teacher latent subspace learning and LASSO
- PDF / 3,849,962 Bytes
- 20 Pages / 595.224 x 790.955 pts Page_size
- 78 Downloads / 209 Views
Deep network compression with teacher latent subspace learning and LASSO Oyebade K. Oyedotun1
¨ Ottersten1 · Abd El Rahman Shabayek1 · Djamila Aouada1 · Bjorn
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Deep neural networks have been shown to excel in understanding multimedia by using latent representations to learn complex and useful abstractions. However, they remain unpractical for embedded devices due to memory constraints, high latency, and considerable power consumption at runtime. In this paper, we propose the compression of deep models based on learning lower dimensional subspaces from their latent representations while maintaining a minimal loss of performance. We leverage on the premise that deep convolutional neural networks extract many redundant features to learn new subspaces for feature representation. We construct a compressed model by reconstruction from representations captured by an already trained large model. As compared to state-of-the-art, the proposed approach does not rely on labeled data. Moreover, it allows the use of sparsity inducing LASSO parameter penalty to achieve better compression results than when used to train models from scratch. We perform extensive experiments using VGG-16 and wide ResNet models on CIFAR-10, CIFAR-100, MNIST and SVHN datasets. For instance, VGG-16 with 8.96M parameters trained on CIFAR-10 was pruned by 81.03 % with only 0.26 % generalization performance loss. Correspondingly, the size of the VGG-16 model is reduced from 35MB to 6.72MB to facilitate compact storage. Furthermore, the associated inference time for the same VGG-16 model is reduced from 1.1 secs to 0.6 secs so that inference is accelerated. Particularly, the proposed student models outperform state-of-the-art approaches and the same models trained from scratch. Keywords Deep neural network · Compression · Pruning · Subspace learning · LASSO
1 Introduction Many computer vision tasks work well with features that are learned using deep neural networks (DNNs) of few This work was funded by the National Research Fund (FNR), Luxembourg, under the project reference R-AGR-0424-05-D/Bj¨o rn Ottersten and CPPP17/IS/11643091/IDform/Aouada Oyebade K. Oyedotun
[email protected] Abd El Rahman Shabayek [email protected] Djamila Aouada [email protected] Bj¨orn Ottersten [email protected] 1
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg, 1855, Luxembourg
(i.e. 3-10) layers of latent representations [1, 2]. However, in recent times, analysing more complex multimedia with reasonable accuracies has necessitated that we rely on features learned from deep networks of several layers of latent representations [3–6] (i.e. over 10 layers), as many works [6–8] posit the benefit of depth and width for approximating complex target functions. The success of very deep networks on learning hard multimedia tasks has motivated their deployment in various electronic devices. However, memory consumption is a
Data Loading...