Parallel Reinforcement Learning with Linear Function Approximation

In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) al

  • PDF / 569,289 Bytes
  • 15 Pages / 430 x 660 pts Page_size
  • 5 Downloads / 207 Views

DOWNLOAD

REPORT


Abstract. In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) algorithm, with value functions represented using linear function approximators. In our proposed method, each agent learns independently in a separate simulation of the single-agent problem. The agents periodically exchange information extracted from the weights of their approximators, accelerating convergence towards the optimal policy. We develop three increasingly efficient versions of this approach to parallel RL, and present empirical results for an implementation of the methods on a Beowulf cluster.

1

Introduction

Reinforcement learning (RL) is by far the most popular machine learning method employed in agent applications, due to its suitability to the paradigm of situated agents. However, real-world applications of RL have been hampered by the fact that the standard algorithms do not scale up well in complex feature-rich environments. Therefore, scaling up RL is of crucial importance for these techniques to make their way into practical real-world solutions. The approach we take in this paper is to parallelize the RL process in order to speed up convergence. The primary goal of such an approach would be to find good solutions to single-agent learning problems more quickly than is possible using non-parallel hardware. This focus differs from most previous work on Multi-Agent Learning [1,2], which is primarily concerned with agents that share an environment and learn to coordinate or compete. While there have been preliminary analyses [3,4] demonstrating the promise of a parallel approach to RL, the cost of inter-agent communication has tended to be excluded from the analysis. There are currently no parallel algorithms for RL that are practical for solving large-scale problems using real parallel hardware. In this paper, we present an approach where the parallel agents learn using identical simulations of a given single-agent RL domain. Each parallel agent uses SARSA, a popular RL algorithm, and represents its value function using a linear function approximator. The features of the approximator are generated using tile-coding [5]. Agents extract information from their value functions which K. Tuyls et al. (Eds.): Adaptive Agents and MAS III, LNAI 4865, pp. 60–74, 2008. c Springer-Verlag Berlin Heidelberg 2008 

Parallel Reinforcement Learning with Linear Function Approximation

61

is then exchanged by passing messages over a limited bandwidth network. By aggregating the information each agent has learned from the environment, the agents converge more quickly towards an optimal policy. We first present an algorithm where agents periodically merge their approximators, then refine the method to reduce its communication overhead. Finally, we define a similar asynchronous algorithm which eliminates the synchronization penalty suffered by the other algorithms.