Multi-GPU Based Recurrent Neural Network Language Model Training

Recurrent neural network language models (RNNLMs) have been applied in a wide range of research fields, including nature language processing and speech recognition. One challenge in training RNNLMs is the heavy computational cost of the crucial back-propa

PDF / 434,936 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
99 Downloads / 296 Views

DOWNLOAD

REPORT

Abstract. Recurrent neural network language models (RNNLMs) have been applied in a wide range of research ﬁelds, including nature language processing and speech recognition. One challenge in training RNNLMs is the heavy computational cost of the crucial back-propagation (BP) algorithm. This paper presents an eﬀective approach to train recurrent neural network on multiple GPUs, where parallelized stochastic gradient descent (SGD) is applied. Results on text-based experiments show that the proposed approach achieves 3.4× speedup on 4 GPUs than the single one, without any performance loss in language model perplexity.

1

Introduction

Statistical language models (LMs) play a vital role in modern nature language processing (NLP) systems. To address the data sparsity problem in conventional n-gram LMs, neural network language models (NNLMs) [1–6] have been proposed, which use shallow neural network to extract features from input words, and back-propagation (BP) [14] algorithm to update the network parameters. Depending on the network architecture, NNLMs can be classiﬁed into two groups: feed-forward NNLM [1,2] where a ﬁnite length of context is considered, and recurrent NNLMs (RNNLMs) [3–6] where a recurrent vector is used to preserve longer and variable length context. In recent years, RNNLMs have been reported to obtain signiﬁcant performance improvements over n-gram LMs and feed-forward NNLMs on a wide range of tasks [3,4]. To build language models based on recurrent neural networks (RNNs), a practical issue is the heavy computational complexity in training the neural network. In typical tasks with huge quantities of input data, the training process exceeds days or even weeks. Such long training periods are uncomfortable for practical use. One direction to accelerate RNN training speed is to use parallel algorithm to train the network on many-core machines or clusters. However, while BP algorithm plays a vital role in RNN training, it is diﬃcult to develop an eﬀective parallel training algorithm, for BP involves a full model update after each iteration. This paper presents a eﬀective data parallelism approach to train RNNs on multiple GPUs, using a ring-oriented graph-structure. Experimental results on c Springer Science+Business Media Singapore 2016 W. Che et al. (Eds.): ICYCSEE 2016, Part I, CCIS 623, pp. 484–493, 2016. DOI: 10.1007/978-981-10-2053-7 43

Multi-GPU Based Recurrent Neural Network Language Model Training

485

the famous Penn Treebank Wall Street Journal corpus show a 3.4× speedup on a cluster with 4 GPUs compared with single one, without degrading performance of the trained language model. The rest of this paper is organized as follows. Section 2 discusses related works. Section 3 describes the architecture of typical recurrent neural networks and critical factors in training. Section 4 introduces the parallel RNN training approach and shows how to apply it on multiple GPUs. In Sect. 5 experimental results on text-based dataset are presented. Conclusions and future works are discussed in Sect. 6.

Data Loading...

Multi-GPU Based Recurrent Neural Network Language Model Training

Recommend Documents

Diagonal Recurrent Neural Network Based Prediction Model for Sales Forecasting

Training and Application of Neural-Network Language Model for Ontology Population

Recurrent neural network with integrated wavelet based denoising unit

EEG-based emotion recognition using 4D convolutional recurrent neural network

Bay Number Recognition Based on Deep Convolutional Recurrent Neural Network

Accurate Scene Text Recognition Based on Recurrent Neural Network

Air Quality Index Prediction Based on Deep Recurrent Neural Network

Fault prediction method for nuclear power machinery based on Bayesian PPCA recurrent neural network model

DeepPlaner: Query Optimization Using Recurrent Neural Network

Traffic Flow Prediction Model and Performance Analysis Based on Recurrent Neural Network

Neural Network Training Using a Biogeography-Based Learning Strategy

Stability Identification of Power System Based Neural Network Training