Attentive multi-view reinforcement learning

  • PDF / 3,565,635 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 83 Downloads / 212 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Attentive multi‑view reinforcement learning Yueyue Hu1 · Shiliang Sun1 · Xin Xu2 · Jing Zhao1 Received: 28 November 2019 / Accepted: 10 April 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability. Keywords  Deep reinforcement learning · Function approximation · Multi-view learning · Representation learning

1 Introduction The basic mechanism of deep reinforcement learning is to provide strategies for sequential decision-making problems, which is a deep learning framework that transforms the raw data into a suitable internal representation, aiming at maximizing expected discounted future return [29]. Deep learning is a method of representation learning with multiple level representations, in which multi-layer perceptrons convert the information of the previous layer into a higher and more abstract representation of the next layer [14]. Recently, deep learning has achieved impressive accomplishments in many areas [7], including using deep neural networks as generalizing function approximators for reinforcement learning. Particularly, DeepMind 1 has utilized a deep neural network as a nonlinear approximator for reinforcement learning agents, making it possible to learn strategies from highdimensional data and achieve performance beyond human * Shiliang Sun [email protected] 1



School of Computer Science and Technology, East China Normal University, Shanghai 200062, China



College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

2

level in a variety of game simulators such as Atari [19, 20], the game of Go [23, 25], Dota 2, and StarCraft II [32], etc. However, different from the available deep reinforcement learning models which exploit the property of hierarchical representation, in this paper, we focus on representation consistency and complementarity at the decision level by conducting function approximation in deep