Efficient Design of Pruned Convolutional Neural Networks on FPGA

  • PDF / 917,065 Bytes
  • 14 Pages / 595.224 x 790.955 pts Page_size
  • 32 Downloads / 201 Views

DOWNLOAD

REPORT


Efficient Design of Pruned Convolutional Neural Networks on FPGA 1 ´ Vestias ´ Mario

Received: 21 April 2020 / Revised: 21 April 2020 / Accepted: 8 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Convolutional Neural Networks (CNNs) have improved several computer vision applications, like object detection and classification, when compared to other machine learning algorithms. Running these models in edge computing devices close to data sources is attracting the attention of the community since it avoids high-latency data communication of private data for cloud processing and permits real-time decisions turning these systems into smart embedded devices. Running these models is computationally very demanding and requires a large amount of memory, which are scarce in edge devices compared to a cloud center. In this paper, we proposed an architecture for the inference of pruned convolutional neural networks in any density FPGAs. A configurable block pruning method is proposed together with an architecture that supports the efficient execution of pruned networks. Also, pruning and batching are studied together to determine how they influence each other. With the proposed architecture, we run the inference of a CNN with an average performance of 322 GOPs for 8-bit data in a XC7Z020 FPGA. The proposed architecture running AlexNet processes 240 images/s in a ZYNQ7020 and 775 images/s in a ZYNQ7045 with only 1.2% accuracy degradation. Keywords Deep learning · Convolutional neural network · FPGA · Block pruning · Edge computing

1 Introduction Deep neural networks (DNN) have shown very promising achievements in computer vision applications, like object detection and classification [1]. The convolutional neural network (CNN) is a type of DNN used to classify images and one of the most researched and deployed deep neural network. Through the identification of correlations among pixels a CNN is able to classify the object present in an image as belonging to a pre-determined class. CNNs differ from the other DNN models since they use a particular class of layers known as convolutional. These layers apply a set of 3D convolutions between 3D kernels of weights and the maps of a previous layer to produce a set of output maps for the next layer. A sequence of these layers identifies features of the image whose complexity increases with the depth of the network. In the final layers of a CNN all features are associated to class with a certain probability. One of the first CNNs was LeNet [2] with a total of 60K weights distributed by five layers. The network was  M´ario V´estias

[email protected] 1

INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Polit´ecnico de Lisboa, Lisbon, Portugal

applied for digit classification with small images. AlexNet [3], a deeper and more complex CNN, was presented in the ImageNet Challenge for image classification, with eight layers with a total of 61M weights and 724 MAC (MultiplyACcumulate) operations to process images of size 224 ×