First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition pro

  • PDF / 1,661,773 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 23 Downloads / 166 Views

DOWNLOAD

REPORT


First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function Patrick Kern1 · Axel Simroth2 · Henryk Zähle1 Received: 23 January 2019 / Revised: 2 September 2019 © The Author(s) 2020

Abstract Markov decision models (MDM) used in practical applications are most often less complex than the underlying ‘true’ MDM. The reduction of model complexity is performed for several reasons. However, it is obviously of interest to know what kind of model reduction is reasonable (in regard to the optimal value) and what kind is not. In this article we propose a way how to address this question. We introduce a sort of derivative of the optimal value as a function of the transition probabilities, which can be used to measure the (first-order) sensitivity of the optimal value w.r.t. changes in the transition probabilities. ‘Differentiability’ is obtained for a fairly broad class of MDMs, and the ‘derivative’ is specified explicitly. Our theoretical findings are illustrated by means of optimization problems in inventory control and mathematical finance. Keywords Markov decision model · Model reduction · Transition probability function · Optimal value · Functional differentiability · Financial optimization

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00186020-00706-w) contains supplementary material, which is available to authorized users.

B

Henryk Zähle [email protected] Patrick Kern [email protected] Axel Simroth [email protected]

1

Department of Mathematics, Saarland University, Saarbrücken, Germany

2

Fraunhofer Institute for Transportation and Infrastructure Systems, Dresden, Germany

123

P. Kern et al.

1 Introduction Already in the 1990th, Müller (1997a) pointed out that the impact of the transition probabilities of a Markov decision process (MDP) on the optimal value of a corresponding Markov decision model (MDM) can not be ignored for practical issues. For instance, in most cases the transition probabilities are unknown and have to be estimated by statistical methods. Moreover in many applications the ‘true’ model is replaced by an approximate version of the ‘true’ model or by a variant which is simplified and thus less complex. The result is that in practical applications the optimal (strategy and thus the optimal) value is most often computed on the basis of transition probabilities that differ from the underlying true transition probabilities. Therefore the sensitivity of the optimal value w.r.t. deviations in the transition probabilities is obviously of interest. Müller (1997a) showed that under some structural assumptions the optimal value in a discrete-time MDM depends continuously on the transition probabilities, and he established bounds for the approximation error. In the course of this the distance between transition probabilities was measured by means of some suitable probability metrics. Even earlier, Kolonko (1983) obtained analogous bounds in a MDM in which the