Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

  • PDF / 2,174,692 Bytes
  • 13 Pages / 595.224 x 790.955 pts Page_size
  • 95 Downloads / 198 Views

DOWNLOAD

REPORT


Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process Dang Quang Nguyen1 · Ngo Anh Vien2 · Viet-Hung Dang3 · TaeChoong Chung4

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Meta-learning has recently received much attention in a wide variety of deep reinforcement learning (DRL). In non-metalearning, we have to train a deep neural network as a controller to learn a specific control task from scratch using a large amount of data. This way of training has shown many limitations in handling different related tasks. Therefore, metalearning on control domains becomes a powerful tool for transfer learning on related tasks. However, it is widely known that meta-learning requires massive computation and training time. This paper will propose a novel DRL framework, which is called HCGF-R2-DDPG (Hybrid CPU/GPU Framework for Reptile+ and Recurrent Deep Deterministic Policy Gradient). HCGF-R2-DDPG will integrate meta-learning into a general asynchronous training architecture. The proposed framework will allow utilising both CPU and GPU to boost the training speed for the meta network initialisation. We will evaluate HCGF-R2-DDPG on various Partially Observable Markov Decision Process (POMDP) domains. Keywords Meta learning · Deep reinforcement learning · Partial observable Markov decision process · Asynchronous framework · Recurrent deep deterministic policy gradient

1 Introduction Dreaming about the general artificial intelligence that has the generalisation ability across many tasks has been desiderata for a long time. Researchers have moved toward  TaeChoong Chung

[email protected] Dang Quang Nguyen [email protected] Ngo Anh Vien [email protected] Viet-Hung Dang [email protected] 1

Institute of Research and Development, Duy Tan University, Da Nang, 550000, Vietnam

2

School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, UK

3

Faculty of Information Technology, Duy Tan University, Duy Tan, Vietnam

4

Department of Computer Science and Engineering, Kyung Hee University, Gyeonggi-do, South Korea

this target with a lot of milestones such as advances in deep learning and meta learning. Meta learning has the ability to learn various tasks across a single distribution. To be specific, it enables the transfer of knowledge from numerous different tasks in a distribution to boost the learning of the new task in the same distribution. With meta learning, the process of learning new tasks can be more efficient in terms of training time and sample usage. However, it is widely known that meta-learning requires massive computation and training time to learn a significant number of tasks as a preparation for network coefficients before the training of a new task can efficiently start. To some extent, it is understood as supervised learning with tasks instead of directly collected samples. In literature, there are several different meta-learning platforms, e.g. metric-based, model-b