Hindsight-Combined and Hindsight-Prioritized Experience Replay

Reinforcement learning has proved to be of great utility; execution, however, may be costly due to sampling inefficiency. An efficient method for training is experience replay, which recalls past experiences. Several experience replay techniques, namely,

PDF / 851,711 Bytes
11 Pages / 439.37 x 666.142 pts Page_size
95 Downloads / 268 Views

DOWNLOAD

REPORT

Division of Information Science, Nara Institute of Science and Technology, Takayama Town, Ikoma, Nara 6300192, Japan {tan.renzo roel perez.tp7,kazushi}@is.naist.jp 2 School of Science and Engineering, Ateneo de Manila University, Katipunan Avenue, National Capital Region, 1108 Quezon City, Philippines {rrtan,jpvergara}@ateneo.edu

Abstract. Reinforcement learning has proved to be of great utility; execution, however, may be costly due to sampling ineﬃciency. An eﬃcient method for training is experience replay, which recalls past experiences. Several experience replay techniques, namely, combined experience replay, hindsight experience replay, and prioritized experience replay, have been crafted while their relative merits are unclear. In the study, one proposes hybrid algorithms – hindsight-combined and hindsight-prioritized experience replay – and evaluates their performance against published baselines. Experimental results demonstrate the superior performance of hindsight-combined experience replay on an OpenAI Gym benchmark. Further, insight into the nonconvergence of hindsightprioritized experience replay is presented towards the improvement of the approach. Keywords: Experience replay · Deep Q-Network learning · Sample eﬃciency · Hybrid algorithm

1

· Reinforcement

Introduction

Reinforcement learning [20] has been the subject of research. Its uncomplicated formulation is capable of capturing a vast number of problems in artiﬁcial intelligence. Fields such as resource management [13], traﬃc signal control [2], and robotics [8] abound with practical applications. Generally, the learning problem is to control a system so as to maximize a numerical value representing a long-term objective [7]. One calls the learner the agent and the agent is established to be in an environment. The standard reinforcement learning formalism, therefore, concurs with a decision making framework consisting of an agent that interacts with an environment and improves its performance based on feedback. At each time step, the agent is given a state and Supported by the Japan Society for the Promotion of Science through the Grants-inAid for Scientiﬁc Research Program (KAKENHI 18K19821). c Springer Nature Switzerland AG 2020 H. Yang et al. (Eds.): ICONIP 2020, LNCS 12533, pp. 429–439, 2020. https://doi.org/10.1007/978-3-030-63833-7_36

430

R. R. P. Tan et al.

it selects an action; the environment then presents a reward and a new state. By and large, the goal is to maximize the cumulative reward. While reinforcement learning shows promise, implementation in real-world contexts can be costly because of sampling ineﬃciency. This means that a multitude of runs are needed for the algorithm to achieve success. A way to address such a complication is through the utilization of experience replay [11], where previous experiences are reused. As an aside, there are other methods through which one may grapple with the problem. Recent alternatives include using Gaussian processes [5] and using babbling [9,14] in speeding up learning. The paper,

Data Loading...

Hindsight-Combined and Hindsight-Prioritized Experience Replay

Recommend Documents

Quantile Regression Hindsight Experience Replay

Sequential Anomaly Detection Using Feedback and Prioritized Experience Replay

Neural Machine Translation Based on Prioritized Experience Replay

Replay Attack

Memory Replay in the Hippocampus

Double Replay Buffers with Restricted Gradient

Time, experience and change

Experience

Design, User Experience, and Usability: Interactive Experience Design

Well-Being and Experience

Semiotics, Art and Experience

MashReDroid: enabling end-user creation of Android mashups based on record and replay