A review on the long short-term memory model
- PDF / 548,943 Bytes
- 27 Pages / 439.37 x 666.142 pts Page_size
- 102 Downloads / 203 Views
A review on the long short-term memory model Greg Van Houdt1 · Carlos Mosquera2 · Gonzalo Nápoles1,3
© Springer Nature B.V. 2020
Abstract Long short-term memory (LSTM) has transformed both machine learning and neurocomputing fields. According to several online sources, this model has improved Google’s speech recognition, greatly improved machine translations on Google Translate, and the answers of Amazon’s Alexa. This neural system is also employed by Facebook, reaching over 4 billion LSTM-based translations per day as of 2017. Interestingly, recurrent neural networks had shown a rather discrete performance until LSTM showed up. One reason for the success of this recurrent network lies in its ability to handle the exploding/vanishing gradient problem, which stands as a difficult issue to be circumvented when training recurrent or very deep neural networks. In this paper, we present a comprehensive review that covers LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example. Keywords Recurrent neural networks · Vanishing/exploding gradient · Long short-term memory · Deep learning
1 Introduction Recurrent or very deep neural networks are difficult to train, as they often suffer from the exploding/vanishing gradient problem (Hochreiter 1991; Kolen and Kremer 2001). To overcome this shortcoming when learning long-term dependencies, the LSTM architecture (Hochreiter and Schmidhuber 1997a) was introduced. The learning ability of LSTM impacted several fields from both a practical and theoretical perspective, so that it became a state-ofthe-art model. This led to the model being used by Google for its speech recognition (Sak
B
Greg Van Houdt [email protected] Gonzalo Nápoles [email protected]
1
Faculty of Business Economics, Hasselt University, Agoralaan gebouw D, 3590 Diepenbeek, Belgium
2
Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 9, 1050 Brussels, Belgium
3
Department of Cognitive Science & Artificial Intelligence, Tilburg University, The Netherlands, Warandelaan 2, 5037 AB Tilburg, The Netherlands
123
G. Van Houdt et al.
et al. 2015), and to improve machine translations on Google Translate (Wu et al. 2016; Metz 2016). Amazon employs the model to improve Alexa’s functionalities (Vogels 2016), and Facebook puts it to use for over 4 billion LSTM-based translations per day as of 2017 (Pino et al. 2017). Due to its high applicability and popularity, this neural architecture has also found its way into the world of gaming. For example, Google’s Deepmind created AlphaStar (The AlphaStar Team 2019b), an artificial intelligence designed to play Starcraft II. Throughout the development of AlphaStar, it started to master the game (The AlphaStar Team 2019a), climbing up the global rankings, which was unseen before. Research in this field is of course not limited to Starcraft II, as the research interest spans the entire RTS gaming genre due to its complexity (Zhang et al. 2019e). To ge
Data Loading...