Evaluating skills in hierarchical reinforcement learning

  • PDF / 3,231,402 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 94 Downloads / 198 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Evaluating skills in hierarchical reinforcement learning Marzieh Davoodabadi Farahani1 · Nasser Mozayani1  Received: 25 August 2019 / Accepted: 2 May 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Despite the benefits mentioned in previous works of automatically acquiring skills for using them in hierarchical reinforcement learning algorithms such as solving the curse of dimensionality, improving exploration, and speeding up value propagation, they have not paid much attention to evaluating the effect of each skill on these factors. In this paper, we show that depending on the given task, a skill may be useful for learning it or not. In addition, the focus of the related work of automatically acquiring skills is on detecting subgoals, i.e., the skill termination condition, but there is not a precise method for extracting the initiation set of skills. In this paper, we propose not only two methods for evaluating skills but also two other methods for pruning the initiation set of them. Experimental results show significant improvements in learning different test domains after evaluating and pruning skills. Keywords  Hierarchical reinforcement learning · Temporal abstraction · Option · Skill · Option evaluation

1 Introduction Reinforcement Learning (RL) has been successfully used in various applications, nonetheless many classical RL methods suffer from the following problems: • Curse of dimensionality: They are not able to handle

high-dimensional problems in a reasonable time [1].

• Transferring knowledge: It is not clearly specified how to

re-use the learned knowledge among similar state spaces [2]. • Slow exploration due to random walking: If the size of the state space is large and the reward is not granted until the goal state is reached, the agent just randomly walks at the beginning steps of its learning phase before it could reach the goal state [3]. • Signal decay over long distances: In RL problems, the reward signal must be back-propagated over long distances. If the discount factor (γ) is not high enough in large problems, the value vanishes too rapidly. High val* Nasser Mozayani [email protected] Marzieh Davoodabadi Farahani [email protected] 1



Computer Engineering Department, Iran University of Science and Technology, Tehran, Iran

ues for γ also lead to the impossibility to discriminate between the best and worst actions in each state for function approximations (if used) [4]. To overcome the above limitations and improve the performance of RL algorithms, hierarchical reinforcement learning algorithms (HRL) have been proposed. They use abstractions to divide a task into a set of subtasks with smaller state spaces and tackle the curse of dimensionality. Temporal abstraction is a kind of abstraction, which refers to creating temporally extended courses of actions and encapsulation of the primitive actions into a single action. Temporal abstractions, also called skills, allow explicit use of similarity or re-use of sub-policies [3]. They he

Data Loading...