Sparse Bayesian Recurrent Neural Networks

Recurrent neural networks (RNNs) have recently gained renewed attention from the machine learning community as effective methods for modeling variable-length sequences. Language modeling, handwriting recognition, and speech recognition are only few of the

PDF / 216,080 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
93 Downloads / 292 Views

DOWNLOAD

REPORT

Abstract. Recurrent neural networks (RNNs) have recently gained renewed attention from the machine learning community as eﬀective methods for modeling variable-length sequences. Language modeling, handwriting recognition, and speech recognition are only few of the application domains where RNN-based models have achieved the state-ofthe-art performance currently reported in the literature. Typically, RNN architectures utilize simple linear, logistic, or softmax output layers to perform data modeling and prediction generation. In this work, for the ﬁrst time in the literature, we consider using a sparse Bayesian regression or classiﬁcation model as the output layer of RNNs, inspired from the automatic relevance determination (ARD) technique. The notion of ARD is to continually create new components while detecting when a component starts to overﬁt, where overﬁt manifests itself as a precision hyperparameter posterior tending to inﬁnity. This way, our method manages to train sparse RNN models, where the number of eﬀective (“active”) recurrently connected hidden units is selected in a data-driven fashion, as part of the model inference procedure. We develop eﬃcient and scalable training algorithms for our model under the stochastic variational inference paradigm, and derive elegant predictive density expressions with computational costs comparable to conventional RNN formulations. We evaluate our approach considering its application to challenging tasks dealing with both regression and classiﬁcation problems, and exhibit its favorable performance over the state-of-the-art.

1

Introduction

Many naturally occurring phenomena such as music, speech, or human motion are inherently sequential. As a consequence, the problem of sequential data modeling is an important area of machine learning research. Recurrent neural networks (RNNs) [22] are among the most powerful models for sequential data modeling. As shown in [12], RNNs possess the desirable property of being universal approximations, as they are capable of representing any measurable sequence to sequence mapping to arbitrary accuracy. RNNs incorporate an internal memory module designed with the goal to summarize the entire sequence history in the form of high dimensional state vector representations. This architectural selection allows for RNNs to model and represent long-term dependencies in the observed data, which is a crucial merit in the context of sequential data modeling applications. c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 359–372, 2015. DOI: 10.1007/978-3-319-23525-7 22

360

S.P. Chatzis

A major challenge RNN-based architectures are confronted with concerns the fact that it is often the case that gradient-based optimization results in error signals either blowing up or decaying exponentially for events many time steps apart, rendering RNN training largely impractical [6,19]. A great deal of research work has been devoted to the amelioration of these diﬃculties, usually referred

Data Loading...

Sparse Bayesian Recurrent Neural Networks

Recommend Documents

Locally Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Evolving Recurrent Neural Networks for Pattern Classification

Convergence Analysis of Recurrent Neural Networks

Generating Adversarial Texts for Recurrent Neural Networks

Supervised Sequence Labelling with Recurrent Neural Networks

Verifying Recurrent Neural Networks Using Invariant Inference

Recurrent Neural Network Learning of Performance and Intrinsic Population Dynamics from Sparse Neural Data

Accelerating Sparse Convolutional Neural Networks Based on Dataflow Architecture

Semantic Recommendations of Books Using Recurrent Neural Networks