Reinforcement learning algorithm for non-stationary environments

PDF / 1,057,686 Bytes
17 Pages / 595.224 x 790.955 pts Page_size
23 Downloads / 403 Views

Reinforcement learning algorithm for non-stationary environments Sindhu Padakandla1

· Prabuchandran K. J.1 · Shalabh Bhatnagar1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward accrued when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem, and a traffic signal control problem. Keywords Markov decision processes · Reinforcement learning · Non-Stationary environments · Change detection

1 Introduction Autonomous agents are increasingly being designed for sequential decision-making tasks under uncertainty in various domains. For e.g., in traffic signal control [34], an autonomous agent decides on the green signal duration for all lanes at a traffic junction, while in robotic applications, human-like robotic agents are built to dexterously manipulate physical objects [3, 29]. The common aspect in these applications is the evolution of the state of the system based on decisions by the agent. In traffic signal control for instance, the state is the vector of current congestion levels at the various lanes of a junction and the agent decides on

Sindhu Padakandla

[email protected] Prabuchandran K. J. [email protected] Shalabh Bhatnagar [email protected] 1

Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India

the green signal duration for all lanes at the junction, while in a robotic application, the state can be motor angles of the joints etc., and the robot decides on the torque for all motors. The key aspect is that the decision by the agent affects the immediate next state of the system, the reward (or cost) obtained as well as the future states. Further, the sequence of decisions by the agent is ranked based on a fixed performance criterion, which is a function of the rewards obtained for all decisions made. The central problem in sequential decision-making is that the agent must find a sequence of decisions for every state such that this per

Data Loading...

Reinforcement learning algorithm for non-stationary environments

Recommend Documents

An Improved On-Policy Reinforcement Learning Algorithm

Reinforcement Learning Based Group Event Invitation Algorithm

Shield Synthesis for Reinforcement Learning

Adaptive Representations for Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

A Deep Reinforcement Learning Bidding Algorithm on Electricity Market

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Basics of Reinforcement Learning

Reinforcement Learning for 3 vs. 2 Keepaway