Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

  • PDF / 1,539,742 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 15 Downloads / 183 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

Design of an adaptive GPU sharing and scheduling scheme in container-based cluster Qichen Chen1 • Jisun Oh2 • Seoyoung Kim2 • Yoonhee Kim2 Received: 1 February 2019 / Revised: 15 June 2019 / Accepted: 23 July 2019 Ó Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Container based virtualization is an innovative technology that accelerates software development by providing portability and maintainability of applications. Recently, a growing number of workloads such as high performance computing (HPC) and Deep Learning(DL) are deployed in the container based environment. However, GPU resource management issues especially the GPU memory over subscription issue in container-based clusters, which brings substantial performance loss, is still challenging. This paper proposes an adaptive fair-share method to share effectively in container-based virtualization environment as well as an execution rescheduling method to manage the execution order of each container for acquiring maximum performance gain. We also proposed a checkpoint based mechanism especially for DL workload running with TensorFlow, which can efficiently solve the GPU memory over subscription problem. We demonstrate that our approach contributes to overall performance improvement as well as higher resource utilization compared to default and static fairshare methods with homogeneous and heterogeneous workloads. Compared to two other conditions, their results show that the proposed method reduces by 16.37%, 15.61% in average execution time and boosts approximately by 52.46%, 10.3% in average GPU memory utilization, respectively. We also evaluated our checkpoint based mechanism by running multiple CNN workloads with TensorFlow at the same time and the result shows our proposed mechanism can ensure each workload executing safely without out of memory (OOM) error occurs. Keywords GPU resource sharing  GPU management  GPU scheduling  GPU virtualization

1 Introduction

A preliminary version of this article was presented at the 3rd IEEE International Workshops on Foundations and Applications of Self* Systems, Trento, Italy, September 2018. & Yoonhee Kim [email protected] Qichen Chen [email protected] Jisun Oh [email protected] Seoyoung Kim [email protected] 1

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea

2

Department of Computer Science, Sookmyung Women’s University, Seoul, South Korea

Recently high performance computing (HPC) applications and deep learning (DL) applications play key roles in many different research fields. The common feature exists in these applications is that all of them require massive computation power, which is in accordance with the high parallelism characteristics of the graphics processing unit (GPU). A GPU is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device [1]. However,