Parallel Reinforcement Learning with Linear Function Approximation

In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) al

PDF / 569,289 Bytes
15 Pages / 430 x 660 pts Page_size
5 Downloads / 280 Views

DOWNLOAD

REPORT

Abstract. In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) algorithm, with value functions represented using linear function approximators. In our proposed method, each agent learns independently in a separate simulation of the single-agent problem. The agents periodically exchange information extracted from the weights of their approximators, accelerating convergence towards the optimal policy. We develop three increasingly eﬃcient versions of this approach to parallel RL, and present empirical results for an implementation of the methods on a Beowulf cluster.

1

Introduction

Reinforcement learning (RL) is by far the most popular machine learning method employed in agent applications, due to its suitability to the paradigm of situated agents. However, real-world applications of RL have been hampered by the fact that the standard algorithms do not scale up well in complex feature-rich environments. Therefore, scaling up RL is of crucial importance for these techniques to make their way into practical real-world solutions. The approach we take in this paper is to parallelize the RL process in order to speed up convergence. The primary goal of such an approach would be to ﬁnd good solutions to single-agent learning problems more quickly than is possible using non-parallel hardware. This focus diﬀers from most previous work on Multi-Agent Learning [1,2], which is primarily concerned with agents that share an environment and learn to coordinate or compete. While there have been preliminary analyses [3,4] demonstrating the promise of a parallel approach to RL, the cost of inter-agent communication has tended to be excluded from the analysis. There are currently no parallel algorithms for RL that are practical for solving large-scale problems using real parallel hardware. In this paper, we present an approach where the parallel agents learn using identical simulations of a given single-agent RL domain. Each parallel agent uses SARSA, a popular RL algorithm, and represents its value function using a linear function approximator. The features of the approximator are generated using tile-coding [5]. Agents extract information from their value functions which K. Tuyls et al. (Eds.): Adaptive Agents and MAS III, LNAI 4865, pp. 60–74, 2008. c Springer-Verlag Berlin Heidelberg 2008

Parallel Reinforcement Learning with Linear Function Approximation

61

is then exchanged by passing messages over a limited bandwidth network. By aggregating the information each agent has learned from the environment, the agents converge more quickly towards an optimal policy. We ﬁrst present an algorithm where agents periodically merge their approximators, then reﬁne the method to reduce its communication overhead. Finally, we deﬁne a similar asynchronous algorithm which eliminates the synchronization penalty suﬀered by the other algorithms.

Data Loading...

Parallel Reinforcement Learning with Linear Function Approximation

Recommend Documents

Parallel probability density approximation

Function Approximation

Driving Reinforcement Learning with Models

Approximation with Positive Linear Operators and Linear Combinations

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Function-Approximation Classification

Deep Reinforcement Learning with Temporal Logics

Deep Reinforcement Learning with Temporal-Awareness Network