A streaming architecture for Convolutional Neural Networks based on layer operations chaining

  • PDF / 4,672,725 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 5 Downloads / 169 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH PAPER

A streaming architecture for Convolutional Neural Networks based on layer operations chaining Moisés Arredondo‑Velázquez1   · Javier Diaz‑Carmona1 · Cesar Torres‑Huitzil2 · Alfredo Padilla‑Medina1 · Juan Prado‑Olivarez1 Received: 16 May 2019 / Accepted: 15 December 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Convolutional Neural Networks (CNN) have become one of the best algorithms in machine learning for content classification of digital images. The CNN computational complexity is much larger than traditional algorithms, that is why the use of Graphical Processor Units (GPU) and online servers to achieve operations acceleration is a common solution. However, there is a growing demand for real-time processing solutions in the object recognition field mainly implemented on embedded systems, which are limited both in resources and energy consumption. Recently, reported works are focused on minimizing the required resources through two design strategies. The first one is by implementing one accelerator that can be adapted to the operations of the whole CNN. The CNN architecture proposals with one accelerator for each convolution layer belong to the second design strategy, where higher performance is achieved in multiple image processing. A new design strategy is proposed in this paper, which is based on multiple accelerators using a layer operation chaining scheme for computing in parallel the operations corresponding to multiple CNN layers. Three types of parallel data processing are adopted in the proposed architecture, where the parallelism level for convolution layers is determined by defined cost-function-based algorithms. The proposed design strategy is shown by implementing three naive CNNs on a De2i-150 board, in which a peak acceleration of 18.04x was achieved in contrast with state-of-the-art design methods without layer operation chaining. Furthermore, the design results of one modified Alexnet CNN were obtained. According to the obtained results, the proposed design strategy allows to achieve a smaller processing time than that obtained by reported works using the other two design strategies. In addition, a competitive result in resources utilization is obtained for naive CNNs. Keywords  Convolutional Neural Networks · Streaming architecture · Layer operation chaining

1 Introduction Convolutional Neural Networks (CNN) have become a very useful tool in many aspects of human daily life, since they are present in many commonly user services such as web * Moisés Arredondo‑Velázquez [email protected] Javier Diaz‑Carmona [email protected] Cesar Torres‑Huitzil [email protected] 1



Electronics Engineering Department, Technological Institute of Celaya, Av. Tecnológico y G. Cubas, s/n, 38010 Celaya, GTO, Mexico



Tecnologico de Monterrey, School of Engineering and Sciences, Campus Puebla, Av. Atlixcayotl 5718, Puebla C.P., 72453 Puebla, Mexico

2

browsers, social networks, smartphones apps, etc. The CNN techniques, as part of the machi