Distributed multi-agent temporal-difference learning with full neighbor information

PDF / 2,117,638 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
93 Downloads / 195 Views

RESEARCH ARTICLE

Distributed multi‑agent temporal‑difference learning with full neighbor information Zhinan Peng1 · Jiangping Hu1 · Rui Luo1 · Bijoy K. Ghosh1,2 Received: 4 July 2020 / Revised: 15 September 2020 / Accepted: 15 September 2020 © South China University of Technology, Academy of Mathematics and Systems Science, CAS and Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract This paper presents a novel distributed multi-agent temporal-difference learning framework for value function approximation, which allows agents using all the neighbor information instead of the information from only one neighbor. With full neighbor information, the proposed framework (1) has a faster convergence rate, and (2) is more robust compared to the state-of-the-art approaches. Then we propose a distributed multi-agent discounted temporal difference algorithm and a distributed multiagent average cost temporal difference learning algorithm based on the framework. Moreover, the two proposed algorithms’ theoretical convergence proofs are provided. Numerical simulation results show that our proposed algorithms are superior to the gossip-based algorithm in convergence speed, robustness to noise and time-varying network topology. Keywords Distributed algorithm · Reinforcement learning · Temporal-difference learning · Multi-agent systems

1 Introduction Reinforcement learning (RL) has seen wide applications to scenarios involving sequential decision making, intelligent control and automation engineering [1–7]. In 2016, AlphaGo beat the top professional chess player, which booms the research area of artificial intelligence (AI). In the meantime, RL, as one of the key technologies used in AlphaGo, draws increasing attentions from both academic researchers and industrial entrepreneurs [8, 9]. Being one of the most popular RL methods, temporal-difference (TD) learning has been shown to be effective for dealing with model-free sequential decision making problems in Markov decision processes (MDPs) [10, 11]. Stochastic approximation methods [12, * Jiangping Hu [email protected] Zhinan Peng [email protected] Rui Luo [email protected] 1

School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, China

Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409‑1042, USA

2

13] and gradient-based TD methods [14, 15] have been proposed to estimate the mean square projected Bellman error (MSPBE). These methods scale linearly with the number of features in terms of computation time and memory usage. Making TD learning distributed is of vital importance. The problem of distributed TD learning with linear value function approximation has been one of the most important open problems in this research field for more than a decade. More general and practical challenge is how to improve the ability of parameterizing and approximating the value function in a high dimensional or infinite state space, which is a very significant situation. Macua appl

Data Loading...

Distributed multi-agent temporal-difference learning with full neighbor information

Recommend Documents

Full-Information Covariance SEM

A Neural Network for Distributed Optimization over Multiagent Networks

Distributed Geospatial Information Processing

Distributed Adaptive Neural Consensus Control for Stochastic Nonlinear Multiagent Systems with Whole State Delays and Mu

Distributed Information Systems

A Sub-linear Time Algorithm for Approximating k-Nearest-Neighbor with Full Quality Guarantee

Full-Information Best Choice Problems with Imperfect Observation

Distributed Learning Classifier Systems

Sparse Asynchronous Distributed Learning

Sample Efficient Multiagent Learning in the Presence of Markovian Agents

Massive Distributed MIMO and Cell-Free Network-Assisted Full Duplex

Communication-efficient distributed multi-task learning with matrix sparsity regularization