Competitive Markov Decision Processes

This book is intended as a text covering the central concepts and techniques of Competitive Markov Decision Processes. It is an attempt to present a rig­ orous treatment that combines two significant research topics: Stochastic Games and Markov Decision P

  • PDF / 5,609,148 Bytes
  • 76 Pages / 439 x 666 pts Page_size
  • 18 Downloads / 343 Views

DOWNLOAD

REPORT


2.0 Introduction This chapter is devoted to the presentation of the basic theory of finite state/finite action Markov decision processes. Of course, in the context of this book, the entire subject of Markov decision processes forms only a special case of the competitive Markov decision processes, that is, Stochastic Games. Namely, this is the case where there is only one controller, and hence the underlying mathematical problem is "merely" an optimization problem. The perspective from which Markov decision processes are discussed in this chapter is that of mathematical programming. This is consistent with the spirit of Part I of the whole book and, to the best of our knowledge, is the first such complete treatment in a textbook (rather than a research monograph) format. Consequently, many of the important properties of Markov decision processes are derived using the techniques of linear programming, and the guiding principle is to view the underlying optimal control problems in the generic mathematical programming framework of: Find a control that maximizes (objective function) subject to:

Satisfaction of feasibility constraints.

We feel that this perspective is important because it links the subject of Markov decision processes with the very rich subject of mathematical programming. J. Filar et al., Competitive Markov Decision Processes © Springer-Verlag New York, Inc. 1997

10

2. Markov Decision Processes: The Noncompetitive Case

As it stands, the chapter is a self-contained introduction to the theory of Markov decision processes. The first five sections, while rigorous, present only the cases that can be settled by elementary arguments and should be accessible to undergraduate students in mathematics, engineering, and economics. The remainder of the chapter is presented at a somewhat more advanced level that might be more suitable for first-year graduate students in the above-mentioned disciplines. Nonetheless, the entire chapter requires only an undergraduate level of mathematics as a prerequisite. Many of the prerequisites also are discussed briefly in the appendices.

2.1

The Summable Markov Decision Processes

We shall consider a process r that is observed at discrete time points t = 0,1,2,3, ... that will sometimes be called stages. At each time point t, the state of the process will be denoted by St. We shall assume that St is a random variable that can take on values from the finite set S = {I, 2, ... ,N} which from now on will be called the state space. The phrase "the process is in state s at time t" will be synonymous with the event {St = s}. We shall assume that the process is controlled by a controller or a decision-maker who chooses an action a E A(s) = {I, 2, ... , m(s)} at time t if the process is in state s at that time. We may regard the action chosen as a realization of a random variable At denoting the controller's choice at time t. Furthermore, we shall assume that the choice of a E A(s) in state s results in an immediate reward or output r(s, a), and in a probabilistic transition to a