Weight asynchronous update: Improving the diversity of filters in a deep convolutional network
- PDF / 1,497,713 Bytes
- 12 Pages / 612 x 808 pts Page_size
- 38 Downloads / 137 Views
Weight asynchronous update: Improving the diversity of filters in a deep convolutional network Dejun Zhang1 , Linchao He2 , Mengting Luo2 , Zhanya Xu1 ( ), and Fazhi He3 c The Author(s) 2020.
1
Abstract Deep convolutional networks have obtained remarkable achievements on various visual tasks due to their strong ability to learn a variety of features. A welltrained deep convolutional network can be compressed to 20%–40% of its original size by removing filters that make little contribution, as many overlapping features are generated by redundant filters. Model compression can reduce the number of unnecessary filters but does not take advantage of redundant filters since the training phase is not affected. Modern networks with residual, dense connections and inception blocks are considered to be able to mitigate the overlap in convolutional filters, but do not necessarily overcome the issue. To do so, we propose a new training strategy, weight asynchronous update, which helps to significantly increase the diversity of filters and enhance the representation ability of the network. The proposed method can be widely applied to different convolutional networks without changing the network topology. Our experiments show that the stochastic subset of filters updated in different iterations can significantly reduce filter overlap in convolutional networks. Extensive experiments show that our method yields noteworthy improvements in neural network performance.
Introduction
In the past few years, deep learning methods based on convolutional neural networks (CNNs) have obtained significant achievements in machine vision [1, 2], shape representation [3–5], speech recognition [6, 7], natural language processing [8–10], etc. In particular, many advanced deep convolutional networks have been proposed to handle visual tasks. For example, the success of deep residual nets has inspired researchers to explore deeper, wider, and more complex frameworks [11, 12]. Deep convolutional networks possess strong learning capability owing to their rich sets of parameters. However, at times, the number of parameters can be excessive, which leads to overlapping and redundant features. It also causes overfitting to the training set and a lack of generalization to new data. Several modern networks, which have hundreds of layers (e.g., ResNet [13], DenseNet [11], and Inception [14]), employ an architectural approach to alleviate the above problems. One key idea is that residual connections in early layers and feature fusion can be considered to add noise in the feature space, which regularizes the network, and hence reduce the overlap of learned deep features. A trained network may be further compressed by pruning, quantization, or binarization, which typically exploits the redundancy in the weights of the trained network. In general, the purpose of model compression, instead of optimizing the capacity of networks in training, is to minimize the memory requirements and to accelerate the speed of inference without degrading performance. Exploring the best perf
Data Loading...