Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

PDF / 1,625,644 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
82 Downloads / 166 Views

(0123456789().,-volV)(0123456789(). ,- volV)

Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration Youngrang Kim1 • Jaehwan Lee1

•

Jik-Soo Kim2 • Hyunseung Jei3 • Hongchan Roh3

Received: 28 December 2018 / Revised: 2 May 2019 / Accepted: 14 August 2019 Ó Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract This paper presents a comprehensive suite of techniques for optimized memory management in multi-GPU systems to accelerate deep learning application execution. We employ a hybrid utilization of GPU and CPU memories in a multi-GPU environment by effectively addressing contention issues in the shared interconnect (e.g., PCIe, NVLink). In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that achieves the highest processing throughput while sustaining a large mini-batch size. We successfully implemented our optimization techniques on TensorFlow, and performed extensive experiments in various multi-GPU environments including traditional PCIe and the latest high-bandwidth interconnect, NVLink. Evaluation results show that our proposed scheme actually improves computing performance by decreasing the I/O bottleneck, and effectively increasing the mini-batch size without sacrificing overall training throughput. Keywords Convolutional neural network GPGPU Multi-GPU Mini-batch

1 Introduction Convolutional neural network (CNN) uses the convolution layer to extract input data features and perform training using those features [1]. It has been widely adopted in deep learning frameworks. With the advent of the increased computing power of general-purpose GPUs (GPGPUs), parallel operations in a CNN can be effectively accelerated. However due to physical limitations in the amount of

& Jaehwan Lee [email protected] Youngrang Kim [email protected] Jik-Soo Kim [email protected] Hyunseung Jei [email protected] Hongchan Roh [email protected] 1

Korea Aerospace University, Goyang-si, Republic of Korea

2

Myongji University, Yongin-si, Republic of Korea

3

SK Telecom ML Infra Lab, Seongnam-si, Republic of Korea

available GPU memory, it is not always possible to compute large-batch input data or large CNN models. In a typical CNN, the feature map data, which are the outputs of convolution layers, occupy the largest portion in GPGPU memory. Feature map data are generated during the process of feed-forwarding. However, they are not used for the actual operation until they are reused during the backwardpropagation process. Therefore, the feature map data can stay in GPU memory for a relatively long time without actual usage until the backward-propagation process begins. To address this problem, virtualized deep neural networks (vDNN) [2] is proposed by NVIDIA which is a runtime memory management system that can virtualize GPU and CPU memory usage. To overcome the physical limitation of available GPGPU memory, vDNN swaps out feature map data, that normally remain in GPU memory for reuse but are not immediately required for p

Data Loading...

Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

Recommend Documents

Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning

Deep curriculum learning optimization

Deep neural learning techniques with long short-term memory for gesture recognition

An Overview of Deep Learning Techniques for Biometric Systems

Potential use of deep learning techniques for postmortem imaging

Parts-of-Speech tagging for Malayalam using deep learning techniques

A Comprehensive Study of Deep Neural Networks for Unsupervised Deep Learning

Deep Learning Techniques in Image Description

Human Action Detection Using Deep Learning Techniques

Cryptographic Algorithm Identification Using Deep Learning Techniques

Deep Learning Techniques for Biomedical and Health Informatics

Deep Learning Techniques for Behavioral Malware Analysis in Cloud IaaS