Interference-aware parallelization for deep learning workload in GPU cluster

PDF / 2,129,554 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
16 Downloads / 216 Views

(0123456789().,-volV)(0123456789(). ,- volV)

Interference-aware parallelization for deep learning workload in GPU cluster Xin Geng1 • Haitao Zhang1

•

Zhengyang Zhao1 • Huadong Ma1

Received: 27 June 2019 / Revised: 9 October 2019 / Accepted: 19 December 2019 Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract With the widespread use of GPUs for performing deep learning applications, the issue of efficient execution of multiple deep learning jobs in a GPU cluster has attracted great attention. It becomes more difficult to achieve efficient workloads parallelization since modern GPUs support concurrent execution of multiple jobs. However, traditional coarse-grained scheduling methods without taking into account interference caused by resource contention among co-executing jobs and characteristics of deep learning jobs can lead to unbalanced use of computing resource and further cause the degradation of jobs performance in the GPU cluster. In this paper, we propose a two-stage workload parallelization approach for deep learning training workloads. We firstly propose two interference-aware prediction models including the Interference-Aware Similarity Prediction (IASP) model based on deep collaborative filtering and the Interference-Aware Performance Prediction (IAPP) model based on deep neural network. Our parallelization approach includes both the cluster-level workload parallelization strategy and the node-level workload parallelization strategy. Specifically, the Cluster-Level Workload Parallelization (CLWP) strategy assigns deep learning jobs to appropriate worker node according to the proposed IASP model, and the Node-Level Workload Parallelization (NLWP) strategy places deep learning tasks to appropriate GPUs according to the proposed IAPP model and the communication costs among tasks. We evaluate our deep learning workload parallelization strategy on a prototype platform with other widely used methods. The experimental results show that the proposed strategy can averagely improve the GPU utilization by 18% and reduce the job completion time by around 22%. Keywords Deep learning Workload parallelization Deep collaborative filtering Deep neural networks Interference aware

1 Introduction During the last few years, deep learning has been widely used to handle challenging problems such as image classification [1–3] and speech recognition [4–6]. Moreover, & Haitao Zhang [email protected] Xin Geng [email protected] Zhengyang Zhao [email protected] Huadong Ma [email protected] 1

Beijing Key Lab of Intelligent Telecomm. Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China

GPU provides high computing power for the above compute-intensive Deep Learning (DL) workloads. In a GPU cluster, multiple DL applications are trained on shared GPU resources to improve execution efficiency. A critical issue in the GPU cluster is how to schedule multiple DL workloads to achieve optimal system performance. This problem becomes more challenging with

Data Loading...

Interference-aware parallelization for deep learning workload in GPU cluster

Recommend Documents

Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster

LLVM Based Parallelization of C Programs for GPU

GPU Parallelization for Accelerating 3D Primitive Equations of Ocean Modeling

Cluster Aware Deep Dictionary Learning for Single Cell Analysis

Deep learning-based edge caching for multi-cluster heterogeneous networks

Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

SecDL: QoS-Aware Secure Deep Learning Approach for Dynamic Cluster-Based Routing in WSN Assisted IoT

Parallelization

Dynamic workload-aware DVFS for multicore systems using machine learning

Workload

Trends in Deep Learning

Advances in Deep Learning