Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster

PDF / 1,122,544 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
104 Downloads / 206 Views

(0123456789().,-volV)(0123456789(). ,- volV)

Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster Youngrang Kim1 • Hyeonseong Choi1 • Jaehwan Lee1 Hongchan Roh3

•

Jik-Soo Kim2 • Hyunseung Jei3

•

Received: 29 November 2019 / Revised: 3 May 2020 / Accepted: 21 June 2020 Ó Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This paper presents a novel ‘‘Distributed Deep Learning Framework’’ for a heterogeneous multi-GPU cluster that can effectively improve overall resource utilization without sacrificing training accuracy. Specifically, we employ a hybrid aggregation approach using a parameter-server and all-reduce schemes in order to address potential performance degradation problems in running deep learning applications on a heterogeneous computing system. In addition, we design and implement an asynchronous large mini-batch training mechanism to maintain training accuracy for asynchronous dataparalleled deep learning processing with enhanced collective communication capability based on MPI. We successfully implement our proposed framework on TensorFlow and perform extensive experiments in both of homogeneous and heterogeneous computing systems. Evaluation results show that our proposed framework can improve computing performance by decreasing I/O bottlenecks, and effectively increasing the resource utilization in the heterogeneous multi-GPU cluster. Keywords Data parallel Distributed deep learning Heterogeneous cluster Large-scale deep learning

1 Introduction Recently, distributed deep learning frameworks have been proposed [1] to accelerate overall deep learning computations by exploiting multiple GPUs and multiple computing

& Jaehwan Lee [email protected] Youngrang Kim [email protected] Hyeonseong Choi [email protected] Jik-Soo Kim [email protected] Hyunseung Jei [email protected] Hongchan Roh [email protected] 1

Korea Aerospace University, Goyang-si, Republic of Korea

2

Myongji University, Yongin-si, Republic of Korea

3

SK Telecom ML Infra Lab., Seongnam-si, Republic of Korea

nodes. Typically, distributed deep learning mechanisms can be classified into asynchronous and synchronous aggregations based on the execution timing of the operations. Also, it can be further categorized into parameterserver [2] and all-reduce [3] schemes depending on the methods of exchanging data for the aggregation among training workers. However, employing combinations of these distributed deep learning mechanisms on top of a heterogeneous multi-GPU cluster may result in lower computing resource utilization. In the case of synchronous training, other training workers may have to wait a substantial amount of time due to relatively slow workers (stragglers) which results in lower computing performance. To address such problems, Ho et. al. proposed a Stale-Synchronous Parallel Parameter Server [4], which worked by specifying the staleness threshold. Each worker maintained its difference in the number of training iterations compared to the slowest worker below

Data Loading...

Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster

Recommend Documents

Deep learning-based edge caching for multi-cluster heterogeneous networks

Deep learning for heterogeneous medical data analysis

A Distributed Trust Framework for Privacy-Preserving Machine Learning

Towards Applying Deep Learning to the Internet of Things: A Model and a Framework

A deep learning framework for face verification without alignment

Correction to: A deep learning framework for football match prediction

A Simple and Effective Framework for Pairwise Deep Metric Learning

Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method

Deepbots: A Webots-Based Deep Reinforcement Learning Framework for Robotics

A deep learning framework for football match prediction

A Deep Learning Based Framework for Distracted Driver Detection

DeepED: A Deep Learning Framework for Estimating Evolutionary Distances