FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks
- PDF / 1,657,197 Bytes
- 15 Pages / 595.224 x 790.955 pts Page_size
- 16 Downloads / 177 Views
FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks Nagadastagiri Challapalle1 · Sahithi Rampalli1 · Nicholas Jao1 · Akshaykrishna Ramanathan1 · John Sampson1 · Vijaykrishnan Narayanan1 Received: 20 September 2019 / Revised: 25 April 2020 / Accepted: 20 May 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Recently, Memory Augmented Neural Networks (MANN)s, a class of Deep Neural Networks (DNN)s have become prominent owing to their ability to capture the long term dependencies effectively for several Natural Language Processing (NLP) tasks. These networks augment conventional DNNs by incorporating memory and attention mechanisms external to the network to capture relevant information. Several MANN architectures have shown particular benefits in NLP tasks by augmenting an underlying Recurrent Neural Network (RNN) with external memory using attention mechanisms. Unlike conventional DNNs whose computational time is dominated by MAC operations, MANNs have more diverse behavior. In addition to MACs, the attention mechanisms of MANNs also consist of operations such as similarity measure, sorting, weighted memory access, and pair-wise arithmetic. Due to this greater diversity in operations, MANNs are not trivially accelerated by the same techniques used by existing DNN accelerators. In this work, we present an end-to-end hardware accelerator architecture, FARM, for the inference of RNNs and several variants of MANNs, such as the Differential Neural Computer (DNC), Neural Turing Machine (NTM) and Meta-learning model. FARM achieves an average speedup of 30x190x and 80x-100x over CPU and GPU implementations, respectively. To address remaining memory bottlenecks in FARM, we then propose the FARM-PIM architecture, which augments FARM with in-memory compute support for MAC and content-similarity operations in order to reduce data traversal costs. FARM-PIM offers an additional speedup of 1.5x compared to FARM. Additionally, we consider an efficiency-oriented version of the PIM implementation, FARM-PIM-LP, that trades a 20% performance reduction relative to FARM for a 4x average power consumption reduction. Keywords Neural network · Attention mechanism · Memory augmentation · In-memory computing · Hardware accelerator
1 Introduction Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are the two most widely used Deep Neural Network (DNN) architectures for a wide range of tasks in Natural Language Processing (NLP) and computer vision. Recently, RNNs, such as Long Short Term Memory (LSTM) networks, have surpassed CNNs for NLP tasks in terms of accuracy as they inherently capture the sequential and temporal contextual information in the input features [36]. RNNs maintain a hidden state to accumulate information from each time step in the input sequence to make the final prediction. They fuse Nagadastagiri Challapalle
[email protected]
Extended author information available on the last page of the article.
relevant information about the sequence of i
Data Loading...