A pruning method based on the measurement of feature extraction ability

  • PDF / 2,148,917 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 78 Downloads / 138 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

A pruning method based on the measurement of feature extraction ability Honggang Wu1 · Yi Tang2 · Xiang Zhang2 Received: 28 September 2019 / Revised: 17 October 2020 / Accepted: 26 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract As the network structure of convolutional neural network (CNN) becomes deeper and wider, network optimization, such as pruning, has received ever-increasing research focus. This paper propose a new pruning strategy based on Feature Extraction Ability Measurement (FEAM), which is a novel index of the feature extraction ability from both theoretical analysis and practical operation. Firstly, FEAM is computed as the product of the the kernel sparsity and feature dispersion. Kernel sparsity describes the ability of feature extraction in theory, and feature dispersion represents the feature extraction ability in practical operation. Secondly, FEAMs of all filters in the network are normalized so that the pruning operation can be applied to cross-layer filters. Finally, filters with weak FEAM are pruned to obtain a compact CNN model. In addition, fine-tuning is adopted to restore the generalization ability. Experiments on CAFAR-10 and CUB-200-2011 demonstrate the effectiveness of our method. Keywords Pruning · Kernel sparsity · Feature dispersion · Feature extraction ability

1 Introduction It is well known that the convolutional neural network (CNN) has achieved a great success in various computer vision tasks [1], including object detection [2–4], object classification [5,6], semantic segmentation [7,8], and many others. However, with the deepening and widening of CNN convolution layer, higher computational overhead and larger memory are required, so it is difficult to deploy CNN model to resource-limited devices [9]. For instance, AlexNet [10] network contains about 6×106 parameters, while some better networks like VGG [11] contain about 1.38×108 parameters. For less complex tasks, such as simple image recognition, the VGG network still require more than 500MB memory

B

Yi Tang [email protected] Honggang Wu [email protected] Xiang Zhang [email protected]

1

Civil Aviation Administration of China, Chengdu 610041, China

2

University of Electronic Science and Technology of China, Chengdu 611731, China

and 1.56 × 1010 Float Point Operations (FLOPs). The overparameterization of deep learning is a major obstacle to the application of CNN [12] . Thus, network compression has drawn a significant amount of interests from both academia and industry. In recent years, numerous efficient compression methods have been proposed, including low-rank approximation [12,13], parameter quantization [14,15] and binarization [16]. Among them, network pruning [17–20] has excellent performance in reducing redundancy of CNNs, and it has better model deployment ability compared with other methods. Network pruning resorts to removing unimportant connections from a well-trained network with negligible impact on network performance. In this paper, a