Distributed B-SDLM: Accelerating the Training Convergence of Deep Neural Networks Through Parallelism

This paper proposes an efficient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is

  • PDF / 478,745 Bytes
  • 8 Pages / 439.37 x 666.142 pts Page_size
  • 54 Downloads / 150 Views

DOWNLOAD

REPORT


VeCAD Research Laboratory, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected], [email protected] 2 Machine Learning Developer Group, Sightline Innovation, #202, 435 Ellice Avenue, Winnipeg, MB R3B 1Y6, Canada [email protected]

Abstract. This paper proposes an efficient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is based on the B-SDLM algorithm that converges fast and requires only minimal computational overhead than the stochastic gradient descent (SGD) method. The proposed algorithm is implemented based on the parameter server thread model in the MPICH implementation. Experiments on the MNIST dataset have shown that training using the distributed B-SDLM on a 16-core CPU cluster allows the convolutional neural network (CNN) model to reach the convergence state very fast, with speedups of 6.03× and 12.28× to reach 0.01 training and 0.08 testing loss values, respectively. This also results in significantly less time taken to reach a certain classification accuracy (5.67× and 8.72× faster to reach 99 % training and 98 % testing accuracies on the MNIST dataset, respectively). Keywords: Deep learning · Distributed machine learning · Stochastic diagonal Levenberg-Marquardt · Convolutional neural network

1

Introduction

Deep learning (DL) is a branch of machine learning (ML) algorithms that learn deeper abstractions of meaningful features by constructing a hierarchical model that perform nonlinear transformations [2]. However, training such complex models is extremely computationally expensive and difficult. This motivates the development of distributed ML techniques that aim to accelerate the training process through parallelism. The concept of distributed ML is to distribute the training process to multiple processing units or machines in a parallel or distributed computing platform [3]. Distributed versions of the learning algorithms have been developed to c Springer International Publishing Switzerland 2016  R. Booth and M.-L. Zhang (Eds.): PRICAI 2016, LNAI 9810, pp. 243–250, 2016. DOI: 10.1007/978-3-319-42911-3 20

244

S.S. Liew et al.

train the DL models in the distributed ML environment. Common distributed learning algorithms are usually derived from conventional first order methods (particularly SGD) [3]. However, first order learning algorithms are known to be inefficient because of their slow convergence. Second order algorithms can converge much faster than first order algorithms [6]. Research reported in [1,3] have applied second order learning algorithms for distributed ML in batch learning mode; however, in most cases, they did not outperform the distributed SGD. Some distributed learning algorithms, like those proposed in [3,8] are effective in training deep models, but they are too computationally expensive. Therefore, this paper aims to improve on the exist