Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

PDF / 1,718,997 Bytes
16 Pages / 595.224 x 790.955 pts Page_size
52 Downloads / 260 Views

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey Anthony Berthelier1

· Thierry Chateau1 · Stefan Duﬀner2 · Christophe Garcia2 · Christophe Blanc1

Received: 16 April 2020 / Revised: 7 August 2020 / Accepted: 3 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Over the past, deep neural networks have proved to be an essential element for developing intelligent solutions. They have achieved remarkable performances at a cost of deeper layers and millions of parameters. Therefore utilising these networks on limited resource platforms for smart cameras is a challenging task. In this context, models need to be (i) accelerated and (ii) memory efficient without significantly compromising on performance. Numerous works have been done to obtain smaller, faster and accurate models. This paper presents a survey of methods suitable for porting deep neural networks on resource-limited devices, especially for smart cameras. These methods can be roughly divided in two main sections. In the first part, we present compression techniques. These techniques are categorized into: knowledge distillation, pruning, quantization, hashing, reduction of numerical precision and binarization. In the second part, we focus on architecture optimization. We introduce the methods to enhance networks structures as well as neural architecture search techniques. In each of their parts, we describe different methods, and analyse them. Finally, we conclude this paper with a discussion on these methods. Keywords Deep learning · Compression · Neural networks · Architecture

1 Introduction Since the advent of deep neural network architectures and their massively parallelized implementations [1, 2], deep learning based methods have achieved state-ofthe-art performance in many applications such as face recognition, semantic segmentation, object detection, etc. In order to achieve these performances, a high computation capability is needed as these models have usually millions of parameters. Moreover, the implementation of these methods on resource-limited devices for smart cameras is difficult due to high memory consumption and strict size constraints. For example, AlexNet [1], is over 200MB and all the milestone models that followed such as VGG [3], GoogleNet [4] and ResNet [5] are not necessarily time or memory efficient. Thus finding solutions to implement

Anthony Berthelier

[email protected] 1

Institut Pascal, 4 Avenue Blaise Pascal, 63178 Aubiere, France

2

LIRIS - 20, Avenue Albert Einstein, 69621 Villeurbanne Cedex, France

deep models on resource-limited platforms such as mobile phones or smart cameras is essential. Each device has a different computational capacity. Therefore, to run these applications on embedded devices the deep models need to be less-parametrized in size and time efficient. Few works has been done focusing on dedicated hardware or FPGA with a fixed specific architecture. Having a specific hardware is helpful to optimize a given applic

Data Loading...

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Recommend Documents

Covariance tracking: architecture optimizations for embedded systems

A comprehensive survey on model compression and acceleration

Trust-embedded collaborative deep generative model for social recommendation

Deep learning parallel computing and evaluation for embedded system clustering architecture processor

Model-Based Development for High-Assurance Embedded Systems

Deep Neural Network Compression via Knowledge Distillation for Embedded Vision Applications

Design Concepts for a Virtualizable Embedded MPSoC Architecture Enab

Deep Kernel machines: a survey

Deep reinforcement learning: a survey

A Deep Learning Architecture for Profile Enrichment and Content Recommendation

A Survey on Model Reduction of Coupled Systems

Area and power efficient DCT architecture for image compression