Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning

PDF / 3,216,007 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
39 Downloads / 232 Views

RESEARCH ARTICLE

Distributed policy evaluation via inexact ADMM in multi‑agent reinforcement learning Xiaoxiao Zhao1 · Peng Yi1,2 · Li Li1,2 Received: 8 April 2020 / Revised: 29 July 2020 / Accepted: 31 July 2020 © South China University of Technology, Academy of Mathematics and Systems Science, CAS and Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established. Keywords Multi-agent system · Reinforcement learning · Distributed optimization · Policy evaluation

1 Introduction In reinforcement learning [1], an agent tries to achieve the optimal policy through interaction with the environment. Multi-agent reinforcement learning (MARL) [2] by incorporating the idea of reinforcement learning for multi-agent systems has received great attention on different complex tasks such as resource allocation, intelligent transportation system, and scheduling [3–6]. Since agents not only interact with the environment but also with other agents, MARL is much more challenging including curse of dimensionality, non-stationary environment, and global exploration [7].

* Li Li [email protected] Xiaoxiao Zhao [email protected] Peng Yi [email protected] 1

College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China

Institute of Intelligent Science and Technology, Tongji University, Shanghai 201203, China

2

In this paper, we study collaborative MARL. Under cooperative settings, agents share a global state, but each agent can only observe its local reward, and the goal is to jointly maximize the globally averaged return of all agents. A straight choice for collaborative MARL is to have a central controller that collects the rewards of all agents and determines the action for each agent. However, a central controller can be too expensive to use or susceptible to attacks, and some agents do not want to leak or share their local information due to privacy and security requirements [8]. Therefore, we consider the fully distributed method, where agents share information with neighbors through a communication network and make decisions based on its local reward and information received from their neighbors. We focus on the policy evaluation problem for collaborative MARL.

1.1 Related work Existing works on multi-agent reinfo

Data Loading...

Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning

Recommend Documents

Formal Policy Synthesis for Continuous-State Systems via Reinforcement Learning

Guided Reinforcement Learning via Sequence Learning

An Improved On-Policy Reinforcement Learning Algorithm

Synthetic Sample Selection via Reinforcement Learning

A Configurable off-Policy Evaluation with Key State-Based Bias Constraints in AI Reinforcement Learning

In-Hand Manipulation via Deep Reinforcement Learning for Industrial Robots

Reinforcement Learning in Sports

Reinforcement Learning

Reinforcement Learning

Battery Management for Automated Warehouses via Deep Reinforcement Learning

Remote sensing image caption generation via transformer and reinforcement learning

Reinforcement Learning