Improving the accuracy of pruned network using knowledge distillation

PDF / 1,828,128 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
29 Downloads / 236 Views

SHORT PAPER

Improving the accuracy of pruned network using knowledge distillation Setya Widyawan Prakosa1 · Jenq‑Shiou Leu1 · Zhao‑Hong Chen2 Received: 2 October 2018 / Accepted: 4 November 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The introduction of convolutional neural networks (CNN) in image processing field has attracted researchers to explore the applications of CNN itself. Some network designs have been proposed to reach the state-of-the-art capability. However, the current design of neural network remains an issue related to the size of the model. Thus, some researchers introduce to reduce or compress the model size. The compression technique might affect the accuracy of the compressed model compared to the original one. In addition, it may influence the performance of the new model. Furthermore, we need to exploit a new scheme to enhance the accuracy of compressed network. In this study, we explore that knowledge distillation (KD) can be integrated to one of pruning methodologies, namely pruning filters, as the compression technique, to enhance the accuracy of pruned model. From all experimental results, we conclude that incorporating KD to create a MobileNets model can enhance the accuracy of pruned network without elongating the inference time. We measured the inference time of model trained with KD is just 0.1 s longer than that of without KD. Furthermore, by reducing 26.08% of the model size, the accuracy without KD is 63.65% and by incorporating KD, we can enhance to 65.37%. Keywords Convolutional neural networks (CNN) · Compression technique · Pruning filters · Knowledge distillation (KD) · Accuracy · Inference time

1 Introduction Currently, various schemes of deep neural network (DNN) have been introduced to satisfy the requirement to handle some specific tasks. The application of neural network schemes, namely convolutional neural network (CNN), for gender classification is presented in [1]. In addition, Zheng et al [2] also studied the application of CNN for action detection using two-stream CNN. Deep learning (DNN) started raising the popularity when AlexNet [3] was introduced in 2012 for Imagenet competition. AlexNet outperformed all methodologies trying to solve the complexity of this competition. In the following year * Jenq‑Shiou Leu [email protected]; [email protected] Zhao‑Hong Chen [email protected] 1

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan

Industrial Technology Research Institute, Hsinchu, Taiwan

2

after the introduction of AlexNet, most of research topics related to the application and new structure of DNN became a new trend of research in this field. VGG structure [4] was then introduced in 2014 followed by Residual Network (ResNet) [5], inception layer [6, 6], and the combination of ResNet and inception [8] that is being regarded as a benchmark for the state of the art network in computer vision field. The undoubted fact is that we can increase

Data Loading...

Improving the accuracy of pruned network using knowledge distillation

Recommend Documents

Improving Knowledge Distillation via Category Structure

Online Ensemble Model Compression Using Knowledge Distillation

AMLN: Adversarial-Based Mutual Learning Network for Online Knowledge Distillation

Circumventing Outliers of AutoAugment with Knowledge Distillation

Local Correlation Consistency for Knowledge Distillation

Proposed Prediction Framework for Improving the Accuracy of Path loss Models of WiMAX Network

Differentiable Feature Aggregation Search for Knowledge Distillation

Feature Normalized Knowledge Distillation for Image Classification

Improving the Accuracy of the KNN Method When Using an Even Number K of Neighbors

Deep Neural Network Compression via Knowledge Distillation for Embedded Vision Applications

Improving Face Recognition from Hard Samples via Distribution Distillation Loss

Improving the Prediction Accuracy of ASD Using Class Imbalance Mitigation Technique