Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU)

  • PDF / 3,446,837 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 32 Downloads / 219 Views

DOWNLOAD

REPORT


Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems Cristian Andy Tănase1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This article describes the implementation of a dynamic scheduler for loading distribution between a hardware accelerator RTL and a CPU software task. The basic composition of a Xilinx-Zynq SoC device is a processing system (PS), coupled with FPGA programmable logic (PL). The two sections are connected via a number of Advanced eXtensible Interfaces. Hardware accelerators are mechanisms whereby different software algorithms are implemented register transfer logic (RTL) in the PL module. These accelerators determine an increased processing speed. In this article, we present a dynamic scheduler used for distribution of the load between the host processor and the RTL accelerator. There are situations in which even with increased processing speed of the accelerator, it cannot cope with the flow of data coming from memory system (shared memory). Therefore, it is necessary for this accelerator to be “aided” by a software module running in a CPU in the PS section. The article describes a scheduler that checks whether a hardware module for data processing meets the requirement of Hard Real Time (data are processed within a well-defined time frame), and in case it does not, it activates a software thread running on a CPU to support the hardware thread (out of the whole amount of data to be processed by the RTL thread, some of it is processed by the SW thread. Thus, the RTL thread will have less data to process). The scheduler activates the SW thread only when the system has to respond in real time and the amount of data cannot be processed within a certain time. Thus, the scheduler detects the need to activate the software thread that “helps” the hardware thread to process the data. The scheduler self-adjusts so that it executes a number of instructions in the software thread at all times, without introducing delays in running the RTL thread which is much faster. For this project PYNQ Z2 board, Vivado 2018.3 and Jupyter Notebook tools have been used. Keywords  ZYNQ · Python · Accelerators · HLS · Parallel processing · Dynamic distribution · Scheduler

Extended author information available on the last page of the article

13

Vol.:(0123456789)

C. A. Tănase

1 Introduction In this article, the author implements a heterogeneous computing architecture and a scheduler used to generate a dynamic load distribution to the different processing units. A ZYNQ module from Xilinx is used for this. The advantage of heterogeneous systems, including the ZYNQ architecture, is that the hardware fits better to the nature of the algorithms that run on it and so, an optimal energy consumption and computing power can be achieved. A classic example is the combination of CPUs (applications processor), graphics processing unit (GPUs), real-time processor (RTPs), video processing unit (VPUs), FPGAs (bitlev