Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

  • PDF / 1,008,451 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 89 Downloads / 188 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems Sholeh Yasini • Mohammad Bagher Naghibi Sitani Ali Kirampor



Received: 12 February 2014 / Accepted: 18 September 2014 Ó Springer-Verlag Berlin Heidelberg 2014

Abstract This paper presents an online adaptive optimal control method based on reinforcement learning to solve the multi-agent nonzero-sum (NZS) differential games of nonlinear constrained-input continuous-time systems. A non-quadratic cost functional associated with each agent is employed to encode the saturation nonlinearity into the NZS game. The algorithm is implemented as a separate actor-critic neural network (NN) structure for every participant in the game, where adaptation of both NNs is performed simultaneously and continuously. The technique of concurrent learning is utilized to obtain novel update laws for the critic NN weights. That is, recorded data and current data are used concurrently for adaptation of the critic NN weights. This results in an algorithm where an easier and verifiable condition is sufficient for parameter convergence rather than the restrictive persistence of excitation (PE) condition. The stability of the closed-loop systems is guaranteed and the convergence to the Nash equilibrium solution of the game is shown. Simulation results show the effectiveness of the proposed method. Keywords Concurrent reinforcement learning  Coupled Hamilton–Jacobi equations  Input constraints  Multi-agent nonzero-sum games  Neural networks 1 Introduction The field of optimal control theory was developed as an approach to analytically determine a control policy that S. Yasini  M. B. Naghibi Sitani (&)  A. Kirampor Department of Electrical Engineering, Ferdowsi University of Mashhad, Mashhad 91775-1111, Iran e-mail: [email protected] S. Yasini e-mail: [email protected]; [email protected]

will satisfy the physical constraints of the system, while also minimizing a cost function. Optimal control has been extensively developed for classes of systems where a single input parameter or agent influences the system. However, many practical systems are controlled by more than one agent or controller such as networking and wireless communication systems [1], large scale systems [2], etc. For these systems with multiple agents game theory offers a natural extension of the dynamic programming solution to the optimal control problem [3]. Nonzero-sum (NZS) differential games [4, 5] are generalization of optimal control theory in situations where more than one agent or controller make decision to control a single nonlinear system, each trying to minimize its individual performance criterion in a Nash equilibrium sense. For nonlinear dynamical systems, finding the Nash equilibrium to the NZS game is equivalent to calculating the solution to the coupled Hamilton–Jacobi (HJ) equations. However, solving the coupled HJ equations is a very difficult problem. Thus, several approximate methods have been present