Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks
- PDF / 2,275,109 Bytes
- 14 Pages / 595.224 x 790.955 pts Page_size
- 48 Downloads / 228 Views
Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks Jiawei Xu1 · Yuxiang Huan1 · Yi Jin1 · Haoming Chu1 · Li-Rong Zheng1 · Zhuo Zou1 Received: 18 December 2019 / Revised: 30 April 2020 / Accepted: 20 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract The growth in the size of deep neural network (DNN) models poses both computational and memory challenges to the efficient and effective implementation of DNNs on platforms with √ limited hardware resources. Our work on segmented logarithmic (SegLog) quantization, adopting both base-2 and base- 2 logarithmic encoding, is able to reduce inference cost with a little accuracy √ penalty. However, weight distribution varies among layers in different DNN models, and requires different base-2 : base- 2 ratios to reach the best accuracy. This means different hardware designs for the decoding √ and computing parts are required. This paper extends the idea of SegLog quantization by using layer-wise base-2 : base- 2 ratio on weight quantization. The proposed base-reconfigurable segmented logarithmic (BRSLog) quantization is able to achieve 6.4x weight compression with 1.66% Top-5 accuracy drop on AlexNet at 5-bit resolution. An arithmetic √ element supporting √ BRSLog-quantified DNN inference is proposed to adapt to different base-2 : base- 2 ratios. With 2 approximation, the resource-consuming multipliers can be replaced by shifters and adders with only 0.54% accuracy penalty. The proposed arithmetic element is simulated in UMC 55nm Low Power Process, and it is 50.42% smaller in area and 55.60% lower in power consumption than the widely-used √ 16-bit fixed-point multiplier. Compared with equivalent SegLog arithmetic element designed for fixed base-2 : base- 2 ratio, the base-reconfigurable part only increases the area by 22.96 μm2 and energy cost by 2.6 μW. Keywords Logarithmic quantization · Neural network · Arithmetic element · Embedded intelligence
1 Introduction Recent Deep Neural Networks (DNN) have shown stateof-the-art performance on various complex computer vision tasks, such as image classification, object detection and scene understanding [1, 2]. The pursuit of high accuracy leads to the growth in network size, which results in high memory requirement and computational complexity. Jiawei Xu and Yuxiang Huan contributed equally to this work. Li-Rong Zheng
[email protected] Zhuo Zou
[email protected] Jiawei Xu [email protected] 1
State Key Laboratory of ASIC and System, Fudan University, Shanghai, China
For instance, 32-bit floating-point AlexNet requires 8.8MB for filter weight storage (2.3 million filter weights) in convolutional (CONV) layers, 223.6MB for weight storage (58.6 million weights) in fully-connected (FC) layers, and 724.4 million MACs (665.8 million MACs for CONV layers and 58.6 million MACs for FC layers) per inference process of a 227×227 RGB image. Memory requirement and computational complexity are the two key considerations for DNN deployment on mobile an
Data Loading...