A review on the long short-term memory model

PDF / 548,943 Bytes
27 Pages / 439.37 x 666.142 pts Page_size
102 Downloads / 213 Views

A review on the long short-term memory model Greg Van Houdt1 · Carlos Mosquera2 · Gonzalo Nápoles1,3

© Springer Nature B.V. 2020

Abstract Long short-term memory (LSTM) has transformed both machine learning and neurocomputing fields. According to several online sources, this model has improved Google’s speech recognition, greatly improved machine translations on Google Translate, and the answers of Amazon’s Alexa. This neural system is also employed by Facebook, reaching over 4 billion LSTM-based translations per day as of 2017. Interestingly, recurrent neural networks had shown a rather discrete performance until LSTM showed up. One reason for the success of this recurrent network lies in its ability to handle the exploding/vanishing gradient problem, which stands as a difficult issue to be circumvented when training recurrent or very deep neural networks. In this paper, we present a comprehensive review that covers LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example. Keywords Recurrent neural networks · Vanishing/exploding gradient · Long short-term memory · Deep learning

1 Introduction Recurrent or very deep neural networks are difficult to train, as they often suffer from the exploding/vanishing gradient problem (Hochreiter 1991; Kolen and Kremer 2001). To overcome this shortcoming when learning long-term dependencies, the LSTM architecture (Hochreiter and Schmidhuber 1997a) was introduced. The learning ability of LSTM impacted several fields from both a practical and theoretical perspective, so that it became a state-ofthe-art model. This led to the model being used by Google for its speech recognition (Sak

B

Greg Van Houdt [email protected] Gonzalo Nápoles [email protected]

1

Faculty of Business Economics, Hasselt University, Agoralaan gebouw D, 3590 Diepenbeek, Belgium

2

Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 9, 1050 Brussels, Belgium

3

Department of Cognitive Science & Artificial Intelligence, Tilburg University, The Netherlands, Warandelaan 2, 5037 AB Tilburg, The Netherlands

123

G. Van Houdt et al.

et al. 2015), and to improve machine translations on Google Translate (Wu et al. 2016; Metz 2016). Amazon employs the model to improve Alexa’s functionalities (Vogels 2016), and Facebook puts it to use for over 4 billion LSTM-based translations per day as of 2017 (Pino et al. 2017). Due to its high applicability and popularity, this neural architecture has also found its way into the world of gaming. For example, Google’s Deepmind created AlphaStar (The AlphaStar Team 2019b), an artificial intelligence designed to play Starcraft II. Throughout the development of AlphaStar, it started to master the game (The AlphaStar Team 2019a), climbing up the global rankings, which was unseen before. Research in this field is of course not limited to Starcraft II, as the research interest spans the entire RTS gaming genre due to its complexity (Zhang et al. 2019e). To ge

Data Loading...

A review on the long short-term memory model

Recommend Documents

Parameterized Model Checking on the TSO Weak Memory Model

Short-Term Demand Forecasting of Shared Bicycles Based on Long Short-Term Memory Neural Network Model

A 3D Convolutional Encapsulated Long Short-Term Memory (3DConv-LSTM) Model for Denoising fMRI Data

Brain tumor detection: a long short-term memory (LSTM)-based learning model

A Long Short-Term Memory Neural Network Model for Predicting Air Pollution Index Based on Popular Learning

On nonparametric ridge estimation for multivariate long-memory processes

Memory Model and Atomics

Video Summarization with Long Short-Term Memory

Origins and Generation of Long Memory

Long memory and regime switching in the stochastic volatility modelling

Exploring the physical interpretation of long-term memory in hydrology

The long-term consequences of retrieval demands during working memory