Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression

Aiming at accelerating the test time of deep convolutional neural networks (CNNs), we propose a model compression method that contains a novel dominant kernel (DK) and a new training method called knowledge pre-regression (KP). In the combined model DK\(^

  • PDF / 1,551,165 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 57 Downloads / 199 Views

DOWNLOAD

REPORT


Abstract. Aiming at accelerating the test time of deep convolutional neural networks (CNNs), we propose a model compression method that contains a novel dominant kernel (DK) and a new training method called knowledge pre-regression (KP). In the combined model DK2 PNet, DK is presented to significantly accomplish a low-rank decomposition of convolutional kernels, while KP is employed to transfer knowledge of intermediate hidden layers from a larger teacher network to its compressed student network on the basis of a cross entropy loss function instead of previous Euclidean distance. Compared to the latest results, the experimental results achieved on CIFAR-10, CIFAR-100, MNIST, and SVHN benchmarks show that our DK2 PNet method has the best performance in the light of being close to the state of the art accuracy and requiring dramatically fewer number of model parameters. Keywords: Dominant convolutional kernel · Knowledge pre-regression · Model compression · Knowledge distilling

1

Introduction

In recent years, deep convolutional neural network (CNN) has made impressive success in several computer vision tasks such as image classification [1], object detection and localization [2,3]. On many benchmark challenges [1,4–6], records have been being consecutively broken with CNNs since 2012 [1] on. Surprising performance, however, usually comes with a heavy computational burden due to the use of deeper and/or wider architectures. Complicated models with numerous parameters may lead to an unacceptable test or inference time consuming for a variety of real applications. To resolve such challenging problem, there was an early interest in hardware-specific optimization [1,7–10]. But it is very likely to be unable to meet increasingly demands for model acceleration in an era of mobile Internet. The reason is that huge amounts of portable devices such as c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 533–548, 2016. DOI: 10.1007/978-3-319-46484-8 32

534

Z. Wang et al.

smart phones and tablets are often equipped with low-end CPUs and GPUs as well as limited memory. To speed up cumbersome CNNs, one is to directly compress existing large models or ensembles into small and fast-to-execute models [11–15]. Another is to employ deep and wide top-performing teacher networks to train shallow and/or thin student networks [16–19]. Based on low rank expansions, convolutional operator is decomposed into two procedures: feature extraction and feature combination (Sect. 3.1). Initially inspired from low rank approximation of responses proposed by Zhang et al. [15], this paper proposes a novel dominant convolutional kernel (DK) for greatly compressing filter banks. To deal with performance degradation problems caused by model compression, we present a new knowledge pre-regression (KP) training method for compressed CNN architectures, which expands the FitNet training method [19] to make it much easier to converge and implement. Such KP-based training method fills the intermediate repr