Active deep Q-learning with demonstration

  • PDF / 3,293,281 Bytes
  • 27 Pages / 439.37 x 666.142 pts Page_size
  • 24 Downloads / 221 Views

DOWNLOAD

REPORT


Active deep Q-learning with demonstration Si-An Chen1

· Voot Tangkaratt2 · Hsuan-Tien Lin1 · Masashi Sugiyama2,3

Received: 26 November 2018 / Revised: 26 June 2019 / Accepted: 26 September 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Abstract Reinforcement learning (RL) is a machine learning technique aiming to learn how to take actions in an environment to maximize some kind of reward. Recent research has shown that although the learning efficiency of RL can be improved with expert demonstration, it usually takes considerable efforts to obtain enough demonstration. The efforts prevent training decent RL agents with expert demonstration in practice. In this work, we propose Active Reinforcement Learning with Demonstration, a new framework to streamline RL in terms of demonstration efforts by allowing the RL agent to query for demonstration actively during training. Under the framework, we propose Active deep Q-Network, a novel query strategy based on a classical RL algorithm called deep Q-network (DQN). The proposed algorithm dynamically estimates the uncertainty of recent states and utilizes the queried demonstration data by optimizing a supervised loss in addition to the usual DQN loss. We propose two methods of estimating the uncertainty based on two state-of-the-art DQN models, namely the divergence of bootstrapped DQN and the variance of noisy DQN. The empirical results validate that both methods not only learn faster than other passive expert demonstration methods with the same amount of demonstration and but also reach super-expert level of performance across four different tasks. Keywords Active learning · Reinforcement learning · Learning from demonstration

Editors: Po-Ling Loh, Evimaria Terzi, Antti Ukkonen, Karsten Borgwardt.

B

Si-An Chen [email protected] Voot Tangkaratt [email protected] Hsuan-Tien Lin [email protected] Masashi Sugiyama [email protected]

1

Present Address: CSIE Department, National Taiwan University, Taipei, Taiwan

2

RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

3

Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan

123

Machine Learning

1 Introduction Sequential decision making is a common and important problem in the real world. For instance, to achieve its goal, a robot should produce a sequence of decisions or movements according to its observations over time. A recommender system should decide when and which item or advertisement to display to a customer in a sequential manner. For sequential decision making, reinforcement learning (Sutton and Barto 1998) (RL) has been recognized as an effective framework which learns from interaction with the environment. Thanks to advances in deep learning and computational hardware, deep RL has achieved a number of successes in various fields such as end-to-end policy search for motor control (Watter et al. 2015), deep Q-Networks for playing Atari games (Mnih et al. 2015), and combining RL and tree sea