Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
- PDF / 3,216,007 Bytes
- 17 Pages / 595.276 x 790.866 pts Page_size
- 39 Downloads / 218 Views
RESEARCH ARTICLE
Distributed policy evaluation via inexact ADMM in multi‑agent reinforcement learning Xiaoxiao Zhao1 · Peng Yi1,2 · Li Li1,2 Received: 8 April 2020 / Revised: 29 July 2020 / Accepted: 31 July 2020 © South China University of Technology, Academy of Mathematics and Systems Science, CAS and Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established. Keywords Multi-agent system · Reinforcement learning · Distributed optimization · Policy evaluation
1 Introduction In reinforcement learning [1], an agent tries to achieve the optimal policy through interaction with the environment. Multi-agent reinforcement learning (MARL) [2] by incorporating the idea of reinforcement learning for multi-agent systems has received great attention on different complex tasks such as resource allocation, intelligent transportation system, and scheduling [3–6]. Since agents not only interact with the environment but also with other agents, MARL is much more challenging including curse of dimensionality, non-stationary environment, and global exploration [7].
* Li Li [email protected] Xiaoxiao Zhao [email protected] Peng Yi [email protected] 1
College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
Institute of Intelligent Science and Technology, Tongji University, Shanghai 201203, China
2
In this paper, we study collaborative MARL. Under cooperative settings, agents share a global state, but each agent can only observe its local reward, and the goal is to jointly maximize the globally averaged return of all agents. A straight choice for collaborative MARL is to have a central controller that collects the rewards of all agents and determines the action for each agent. However, a central controller can be too expensive to use or susceptible to attacks, and some agents do not want to leak or share their local information due to privacy and security requirements [8]. Therefore, we consider the fully distributed method, where agents share information with neighbors through a communication network and make decisions based on its local reward and information received from their neighbors. We focus on the policy evaluation problem for collaborative MARL.
1.1 Related work Existing works on multi-agent reinfo
Data Loading...