Training deep neural networks: a static load balancing approach

  • PDF / 1,591,455 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 90 Downloads / 203 Views

DOWNLOAD

REPORT


Training deep neural networks: a static load balancing approach Sergio Moreno‑Álvarez1   · Juan M. Haut2 · Mercedes E. Paoletti2 · Juan A. Rico‑Gallego1 · Juan C. Díaz‑Martín2 · Javier Plaza2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of wellknown HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance. Keywords  Deep learning · High-performance computing · Distributed training · Heterogeneous platforms

* Sergio Moreno‑Álvarez [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)



S. Moreno‑Álvarez et al.

1 Introduction Deep learning (DL) algorithms based on neural network architectures [19] have reached great accuracy in areas such as image classification [17, 20] and speech recognition [5] among others. When compared with other machine learning (ML) and pattern recognition methods, deep neural networks (DNNs) work as universal approximators of parameterized maps (models) composed of stacks of layers [12], where each one is composed by several nodes (neurons) connected to the nodes of the precedent and subsequent layers through synaptic weights and saturation control biases [22]. Overall, DNN models fit neuron weights and biases through an iterative optimization process based on training with examples. Improvements with respect to traditional techniques are supported by the large amount of data available to train these models, as well as by advances in high-performance computing (HPC) platforms [9]. DNN learning strategies can be roughly classified into supervised and unsupervised learning [14], depend