Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

PDF / 2,174,692 Bytes
13 Pages / 595.224 x 790.955 pts Page_size
95 Downloads / 225 Views

Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process Dang Quang Nguyen1 · Ngo Anh Vien2 · Viet-Hung Dang3 · TaeChoong Chung4

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Meta-learning has recently received much attention in a wide variety of deep reinforcement learning (DRL). In non-metalearning, we have to train a deep neural network as a controller to learn a specific control task from scratch using a large amount of data. This way of training has shown many limitations in handling different related tasks. Therefore, metalearning on control domains becomes a powerful tool for transfer learning on related tasks. However, it is widely known that meta-learning requires massive computation and training time. This paper will propose a novel DRL framework, which is called HCGF-R2-DDPG (Hybrid CPU/GPU Framework for Reptile+ and Recurrent Deep Deterministic Policy Gradient). HCGF-R2-DDPG will integrate meta-learning into a general asynchronous training architecture. The proposed framework will allow utilising both CPU and GPU to boost the training speed for the meta network initialisation. We will evaluate HCGF-R2-DDPG on various Partially Observable Markov Decision Process (POMDP) domains. Keywords Meta learning · Deep reinforcement learning · Partial observable Markov decision process · Asynchronous framework · Recurrent deep deterministic policy gradient

1 Introduction Dreaming about the general artificial intelligence that has the generalisation ability across many tasks has been desiderata for a long time. Researchers have moved toward TaeChoong Chung

[email protected] Dang Quang Nguyen [email protected] Ngo Anh Vien [email protected] Viet-Hung Dang [email protected] 1

Institute of Research and Development, Duy Tan University, Da Nang, 550000, Vietnam

2

School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, UK

3

Faculty of Information Technology, Duy Tan University, Duy Tan, Vietnam

4

Department of Computer Science and Engineering, Kyung Hee University, Gyeonggi-do, South Korea

this target with a lot of milestones such as advances in deep learning and meta learning. Meta learning has the ability to learn various tasks across a single distribution. To be specific, it enables the transfer of knowledge from numerous different tasks in a distribution to boost the learning of the new task in the same distribution. With meta learning, the process of learning new tasks can be more efficient in terms of training time and sample usage. However, it is widely known that meta-learning requires massive computation and training time to learn a significant number of tasks as a preparation for network coefficients before the training of a new task can efficiently start. To some extent, it is understood as supervised learning with tasks instead of directly collected samples. In literature, there are several different meta-learning platforms, e.g. metric-based, model-b

Data Loading...

Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

Recommend Documents

Partially Observable Markov Decision Processes

Optimality Conditions for Partially Observable Markov Decision Processes

Markov Decision Processes with Applications to Finance

A Parallel Tasks Scheduling Algorithm with Markov Decision Process in Edge Computing

A Tensor-Based Markov Decision Process Representation

Deep Active Inference for Partially Observable MDPs

A Modified Design Framework Based on Markov Decision Process for Operational Evaluation

Asynchronous Control for Positive Markov Jump Systems

Markov Decision Processes With Their Applications

MDPRP: Markov Decision Process Based Routing Protocol for Mobile WSNs

Risk Sensitive Markov Decision Process for Portfolio Management

Generalising Experience in Reinforcement Learning: Performance in Partially Observable Processes