A Discounted Approach in Communicating Average Markov Decision Chains Under Risk-Aversion

  • PDF / 381,039 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 9 Downloads / 171 Views

DOWNLOAD

REPORT


A Discounted Approach in Communicating Average Markov Decision Chains Under Risk-Aversion Julio Saucedo-Zul1 · Rolando Cavazos-Cadena2 · Hugo Cruz-Suárez1 Received: 17 May 2020 / Accepted: 24 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This work concerns with discrete-time Markov decision processes on a denumerable state space. Assuming that the decision maker is risk-averse with constant risksensitivity coefficient, the performance of a control policy is measured by an average criterion associated with a non-negative and bounded cost function. Under conditions ensuring that the optimal average cost is constant, but not necessarily determined via the average cost optimality equation, it is shown that a discounted criterion can be used to approximate the optimal average index. Keywords Contractive operator · Fixed point · Equivalence of inferior and superior limit average criteria · Exponential utility · Compact support Mathematics Subject Classification 93E20 · 93C55 · 60J05

1 Introduction This note concerns with discrete-time Markov decision processes (MDPs) evolving on a denumerable state space. The system is driven by a decision maker with constant

Communicated by Bruno Bouchard.

B

Rolando Cavazos-Cadena [email protected] Julio Saucedo-Zul [email protected] Hugo Cruz-Suárez [email protected]

1

Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla, Puebla, PUE, Mexico

2

Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Saltillo, COAH, Mexico

123

Journal of Optimization Theory and Applications

and positive risk-sensitivity coefficient, and the overall performance of a control policy is measured by the (superior limit) risk-sensitive average criterion associated with a (non-negative and) bounded cost function. Besides standard continuity-compactness conditions, the framework of the paper is determined by two requirements on the transition law: (i) the state space is communicating under the action of any stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this context, it was recently shown that the optimal average cost is not necessarily determined via the usual optimality equation [1]. In contrast, for a discounted index associated with an auxiliary zero-sum game, the optimal value function is determined as the fixed point of a contractive operator and then is characterized via a single optimality equation. Thus, it is interesting to study the following ‘discounted approach’ problem: • To approximate the optimal risk-sensitive average cost via the fixed points associated with a class of discounted operators. A solution to this problem is well-known in the risk-neutral case [2,3], and the results of this work extend the conclusions in [4], where it was assumed that the risk-sensitivity coefficient is sufficiently small, and in [5], were models with finite state space were considered. The above question naturally leads to study the following problem