Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication
- PDF / 1,059,239 Bytes
- 21 Pages / 439.37 x 666.142 pts Page_size
- 47 Downloads / 163 Views
Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication Emanuele Pesce1
· Giovanni Montana1
Received: 21 January 2019 / Revised: 28 October 2019 / Accepted: 6 December 2019 © The Author(s) 2020
Abstract Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance in small-scale systems. Our experimental results show that the proposed method achieves superior performance in scenarios with up to six agents. We illustrate how different communication patterns can emerge on six different tasks of increasing complexity. Furthermore, we study the effects of corrupting the communication channel, provide a visualisation of the time-varying memory content as the underlying task is being solved and validate the building blocks of the proposed memory device through ablation studies. Keywords Reinforcement learning · Multi-agent systems · Artificial neural networks
Editors: Karsten Borgwardt, Po-Ling Loh, Evimaria Terzi, Antti Ukkonen. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10994-01905864-5) contains supplementary material, which is available to authorized users.
B
Emanuele Pesce [email protected] Giovanni Montana [email protected]
1
WMG, University of Warwick, Coventry CV4 7AL, UK
123
Machine Learning
1 Introduction Reinforcement Learning (RL) allows agents to learn how to map observations to actions through feedback reward signals (Sutton and Barto 1998). Recently, deep neural networks (LeCun et al. 2015; Schmidhuber 2015) have had a noticeable impact on RL (Li 2017). They provide flexible models for learning value functions and policies, allow to overcome difficulties related to large state spaces, and eliminate the need for hand-crafted features and ad-hoc heuristics (Cortes et al. 2002; Parker et al. 2003; Olfati-Saber et al. 2007). Deep reinforcement learning (DRL) algorithms, which usually rely on deep neural networks to approximate functions, have been successfully employed in single-agent systems, including video game playing (Mnih et al. 2015), robot locomotion (Lillicrap et al. 2015), object localisation (Caicedo and Lazebnik 2015) and data-center cooling (Evans and Gao 2016). Following the uptake of DRL in
Data Loading...