Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic infor

  • PDF / 1,103,083 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 72 Downloads / 149 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

EXTREME LEARNING MACHINE AND DEEP LEARNING NETWORKS

Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information Shuping He1,2



Maoguang Zhang1 • Haiyang Fang1 • Fei Liu3 • Xiaoli Luan3 • Zhengtao Ding4

Received: 29 November 2018 / Accepted: 29 March 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract In this paper, an online adaptive optimal control problem of a class of continuous-time Markov jump linear systems (MJLSs) is investigated by using a parallel reinforcement learning (RL) algorithm with completely unknown dynamics. Before collecting and learning the subsystems information of states and inputs, the exploration noise is firstly added to describe the actual control input. Then, a novel parallel RL algorithm is used to parallelly compute the corresponding N coupled algebraic Riccati equations by online learning. By this algorithm, we will not need to know the dynamic information of the MJLSs. The convergence of the proposed algorithm is also proved. Finally, the effectiveness and applicability of this novel algorithm is illustrated by two simulation examples. Keywords Markov jump linear systems (MJLSs)  Adaptive optimal control  Online  Reinforcement learning (RL)  Coupled algebraic Riccati equations (AREs)

1 Introduction Markov jump linear systems (MJLSs), firstly proposed by Krasovskii and Lidskii [1] in 1961, can be considered as a kind of multi-model stochastic systems. In MJLSs, it & Shuping He [email protected] Haiyang Fang [email protected] Fei Liu [email protected] Xiaoli Luan [email protected] Zhengtao Ding [email protected] 1

School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China

2

Institute of Physical Science and Information Technology, Anhui University, Hefei 230601, China

3

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Institute of Automation, Jiangnan University, Wuxi 214122, China

4

School of Electrical and Electronic Engineering, The University of Manchester, Manchester M13 9PL, UK

contains two mechanisms, i.e., the modes and the states. The modes are jumping dynamics, modeled by finite-state Markov chains. The states are continuous or discrete, modeled by a set of differential or difference equations. With the development of control science and stochastic theory, MJLSs have been widely concerned and many research results are available, such as stochastic stability and stabilizability [2–4], controllability [5–9] and robust estimation and filtering [10–12]. In recent years, the adaptive optimal control problem has become a focused issue in controllers design and many related works have been published. For example, the authors in [13] studied the adaptive surface optimal control methods for strict-feedback systems. Then, the observerbased adaptive fuzzy control law was proposed for nonlinear nonstrict-feedback systems [14]. A general method