Deep learning parallel computing and evaluation for embedded system clustering architecture processor

  • PDF / 1,474,236 Bytes
  • 15 Pages / 439.37 x 666.142 pts Page_size
  • 32 Downloads / 245 Views

DOWNLOAD

REPORT


Deep learning parallel computing and evaluation for embedded system clustering architecture processor Yue Zu1 Received: 6 September 2019 / Accepted: 2 March 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods. Keywords Clustered architecture processor · Parallel computing · Deep learning · Performance evaluation

1 Introduction The field of industrial and automatic control is the earliest field of application of embedded devices. With the development of technology and the advancement of society, embedded systems are gradually becoming more widely used [1]. So far, all aspects of human production and life have long been inseparable from embedded devices, and it still has broad development space and application prospects [2, 3]. At present, the development direction

B 1

Yue Zu [email protected] Department of Human Resources Office, Jilin Institute of Chemical Technology, Jilin 132022, China

123

Y. Zu

of embedded technology is to provide more intelligent and humanized services for human beings [4, 5]. However, the development of traditional embedded systems has faced performance bottlenecks. Among them, the high performance requirements and the scalability of the structure require the development of higher performance embedded processors, and the power consumption and volume requirements cause the computational performance of the embedded processor not to be improved as much as possible [6, 7]. Therefore, researching new theory, new technology and new architecture of embedded processor, while meeting performance requirements, reduci