Efficient scheduling of streams on GPGPUs
- PDF / 2,539,016 Bytes
- 33 Pages / 439.37 x 666.142 pts Page_size
- 102 Downloads / 181 Views
Efficient scheduling of streams on GPGPUs Mohamad Beheshti Roui1 · S. Kazem Shekofteh2 · Hamid Noori1 · Ahad Harati1
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Graphics processing units (GPUs) are widely used for scientific and engineering applications with high level of parallelism. The computing power of GPUs is improving through enhancing their architectural facilities. NVIDIA’s compute unified device architecture (CUDA) stream using hyper-Q on NVIDIA graphic cards is a reputable capability for performance improvement. Without any synchronization and based on architectural capabilities, NVIDIA’s CUDA stream allows some processes to run simultaneously. Experimental results show that how to stream a number of programs affect the execution time. Therefore, the stream set with the highest amount of performance improvement is the efficient stream set. This article proposes a framework to predict the efficient stream set on two streams without trying all combinations, which would be a very time-consuming process. The proposed framework employs a performance model and a scheduler. The performance model estimates the duration of simultaneous portions of streamed programs and the scheduler uses the estimation of the model to predict the efficient stream set. The proposed prediction method relies on non-stream features of programs. The results show that even with 33% error of performance model in average, the scheduler predicts the optimized sets with 100% precision. Keywords Stream scheduling · Efficient CUDA streaming · Performance estimation · Efficient stream combination · GPGPU code optimization · Performance model
* Hamid Noori [email protected] 1
Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Khorasan Razavi, Iran
2
Department of Computer Engineering, Shandiz Institute of Higher Education, Mashhad, Khorasan Razavi, Iran
13
Vol.:(0123456789)
M. Beheshti Roui et al.
1 Introduction GPUs and the field of high-performance computing (HPC) are inseparable topics over the last two decades. HPC scientific and engineering problems are intensively computational. Therefore, GPU is a great candidate in the field of HPC, due to its increasing compute capability. GPUs consist of architectural components such as streaming multiprocessors (SMs), L2 caches, GPU Global DRAM, shared memories, registers and processing elements. SMs include components such as processing elements, shared memories and registers. GPU architecture variations are due to different number and size of each mentioned components. Thus, GPU architecture has a fundamental role in its performance behavior. The primary purpose of all GPU hardware resources is to provide more data parallelism. The most common GPU programming environments are CUDA [1] and OpenCL [2]. NVIDIA developed CUDA programming model, which is only supported on this vendor’s GPUs. The approach and methodology of the current article is based on CUDA model due to CUDA support of concurrency on
Data Loading...