Distributed B-SDLM: Accelerating the Training Convergence of Deep Neural Networks Through Parallelism

This paper proposes an efficient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is

PDF / 478,745 Bytes
8 Pages / 439.37 x 666.142 pts Page_size
54 Downloads / 171 Views

DOWNLOAD

REPORT

VeCAD Research Laboratory, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected], [email protected] 2 Machine Learning Developer Group, Sightline Innovation, #202, 435 Ellice Avenue, Winnipeg, MB R3B 1Y6, Canada [email protected]

Abstract. This paper proposes an eﬃcient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is based on the B-SDLM algorithm that converges fast and requires only minimal computational overhead than the stochastic gradient descent (SGD) method. The proposed algorithm is implemented based on the parameter server thread model in the MPICH implementation. Experiments on the MNIST dataset have shown that training using the distributed B-SDLM on a 16-core CPU cluster allows the convolutional neural network (CNN) model to reach the convergence state very fast, with speedups of 6.03× and 12.28× to reach 0.01 training and 0.08 testing loss values, respectively. This also results in signiﬁcantly less time taken to reach a certain classiﬁcation accuracy (5.67× and 8.72× faster to reach 99 % training and 98 % testing accuracies on the MNIST dataset, respectively). Keywords: Deep learning · Distributed machine learning · Stochastic diagonal Levenberg-Marquardt · Convolutional neural network

1

Introduction

Deep learning (DL) is a branch of machine learning (ML) algorithms that learn deeper abstractions of meaningful features by constructing a hierarchical model that perform nonlinear transformations [2]. However, training such complex models is extremely computationally expensive and diﬃcult. This motivates the development of distributed ML techniques that aim to accelerate the training process through parallelism. The concept of distributed ML is to distribute the training process to multiple processing units or machines in a parallel or distributed computing platform [3]. Distributed versions of the learning algorithms have been developed to c Springer International Publishing Switzerland 2016 R. Booth and M.-L. Zhang (Eds.): PRICAI 2016, LNAI 9810, pp. 243–250, 2016. DOI: 10.1007/978-3-319-42911-3 20

244

S.S. Liew et al.

train the DL models in the distributed ML environment. Common distributed learning algorithms are usually derived from conventional ﬁrst order methods (particularly SGD) [3]. However, ﬁrst order learning algorithms are known to be ineﬃcient because of their slow convergence. Second order algorithms can converge much faster than ﬁrst order algorithms [6]. Research reported in [1,3] have applied second order learning algorithms for distributed ML in batch learning mode; however, in most cases, they did not outperform the distributed SGD. Some distributed learning algorithms, like those proposed in [3,8] are eﬀective in training deep models, but they are too computationally expensive. Therefore, this paper aims to improve on the exist

Data Loading...

Distributed B-SDLM: Accelerating the Training Convergence of Deep Neural Networks Through Parallelism

Recommend Documents

Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology

Training deep neural networks: a static load balancing approach

Artificial neural networks training acceleration through network science strategies

Convergence Analysis of Recurrent Neural Networks

Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method

A survey on face data augmentation for the training of deep neural networks

Deep Neural Networks: Incremental Learning

Accelerating Sparse Convolutional Neural Networks Based on Dataflow Architecture

Convergence of multiple deep neural networks for classification with fewer labeled data

Exploring Internal Representations of Deep Neural Networks

Lossless Compression of Deep Neural Networks

Automatic Dropout for Deep Neural Networks