Efficient Design of Pruned Convolutional Neural Networks on FPGA

PDF / 917,065 Bytes
14 Pages / 595.224 x 790.955 pts Page_size
32 Downloads / 201 Views

Eﬃcient Design of Pruned Convolutional Neural Networks on FPGA 1 ´ Vestias ´ Mario

Received: 21 April 2020 / Revised: 21 April 2020 / Accepted: 8 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Convolutional Neural Networks (CNNs) have improved several computer vision applications, like object detection and classification, when compared to other machine learning algorithms. Running these models in edge computing devices close to data sources is attracting the attention of the community since it avoids high-latency data communication of private data for cloud processing and permits real-time decisions turning these systems into smart embedded devices. Running these models is computationally very demanding and requires a large amount of memory, which are scarce in edge devices compared to a cloud center. In this paper, we proposed an architecture for the inference of pruned convolutional neural networks in any density FPGAs. A configurable block pruning method is proposed together with an architecture that supports the efficient execution of pruned networks. Also, pruning and batching are studied together to determine how they influence each other. With the proposed architecture, we run the inference of a CNN with an average performance of 322 GOPs for 8-bit data in a XC7Z020 FPGA. The proposed architecture running AlexNet processes 240 images/s in a ZYNQ7020 and 775 images/s in a ZYNQ7045 with only 1.2% accuracy degradation. Keywords Deep learning · Convolutional neural network · FPGA · Block pruning · Edge computing

1 Introduction Deep neural networks (DNN) have shown very promising achievements in computer vision applications, like object detection and classification [1]. The convolutional neural network (CNN) is a type of DNN used to classify images and one of the most researched and deployed deep neural network. Through the identification of correlations among pixels a CNN is able to classify the object present in an image as belonging to a pre-determined class. CNNs differ from the other DNN models since they use a particular class of layers known as convolutional. These layers apply a set of 3D convolutions between 3D kernels of weights and the maps of a previous layer to produce a set of output maps for the next layer. A sequence of these layers identifies features of the image whose complexity increases with the depth of the network. In the final layers of a CNN all features are associated to class with a certain probability. One of the first CNNs was LeNet [2] with a total of 60K weights distributed by five layers. The network was M´ario V´estias

[email protected] 1

INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Polit´ecnico de Lisboa, Lisbon, Portugal

applied for digit classification with small images. AlexNet [3], a deeper and more complex CNN, was presented in the ImageNet Challenge for image classification, with eight layers with a total of 61M weights and 724 MAC (MultiplyACcumulate) operations to process images of size 224 ×

Data Loading...

Efficient Design of Pruned Convolutional Neural Networks on FPGA

Recommend Documents

Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks

Convolutional Neural Networks

A Review on Convolutional Neural Networks

Constrained Evolutionary Piecemeal Training to Design Convolutional Neural Networks

Convolutional Neural Networks for Clothes Categories

Self-paced hybrid dilated convolutional neural networks

Recognizing handwritten digits with convolutional neural networks

Convolutional Neural Networks for Traffic Signs Recognition

Biometric Authentication Using Convolutional Neural Networks

Traffic Sign Detection with Convolutional Neural Networks

Convolutional Neural Networks and Texture Classification

Object Detection with Convolutional Neural Networks