Exponential Discretization of Weights of Neural Network Connections in Pre-Trained Neural Network. Part II: Correlation
- PDF / 708,337 Bytes
- 8 Pages / 612 x 792 pts (letter) Page_size
- 58 Downloads / 237 Views
xponential Discretization of Weights of Neural Network Connections in Pre-Trained Neural Network. Part II: Correlation Maximization M. M. Pushkarevaa, * and I. M. Karandasheva, b, ** a
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, 117218 Russia bPeoples Friendship University of Russia (RUDN University), Moscow, 117198 Russia *e-mail: [email protected] **e-mail: [email protected] Received May 12, 2019; revised May 27, 2020; accepted June 1, 2020
Abstract—In this article, we develop method of linear and exponential quantization of neural network weights. We improve it by means of maximizing correlations between the initial and quantized weights taking into account the weight density distribution in each layer. We perform the quantization after the neural network training without a subsequent post-training and compare our algorithm with linear and exponential quantization. The quality of the neural network VGG-16 is already satisfactory (top5 accuracy 76%) in the case of 3-bit exponential quantization. The ResNet50 and Xception neural networks show top5 accuracy at 4 bits 79% and 61%, respectively. Keywords: weight quantization, correlation maximization, exponential quantization, neural network, neural network compression, reduction of bit depth of weights DOI: 10.3103/S1060992X20030042
1. INTRODUCTION The majority of neural networks, which we use when solving image recognition problems, have many parameters that have to be stored. Consequently, a substantial memory capacity is necessary and this requirement limits such neural networks applicability. For example, the storage capacity of the neural networks VGG-16 and ResNet152V2 are 528 [4] and 232 [5] MBs, respectively. There are different methods allowing us to decrease the size of the pre-trained neural network. Usually, one can reduce the number of the weights with the aid of such methods as pruning algorithms [14], sharing weights (including application of the convolution operation) [15], tensor decomposition [16] and so on. In particular, one of these methods is a quantization that is a reduction of bit width of the neural network weights by dividing the weight distribution interval into discrete values. The most popular quantization methods are application of the fixed-point formats in place of the floating-point formats [10], binarization [11], ternarization [12], use of a logarithmic scale [13] and so on. The authors of paper [8] discussed the optimal quantization problem for the Hopfield neural network. They showed that when maximizing correlations between the initial and quantized values of the weights it was possible to minimize the errors of the quantized neural network. We believe that this result is correct and in what follows, we choose the interval boundaries and the quantized values of the weights inside of the intervals proceeding from a maximal correlation principle. The most realizations of quantization of neural networks weights include the neural networks re-training in the course of th
Data Loading...