First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition pro

PDF / 1,661,773 Bytes
33 Pages / 439.37 x 666.142 pts Page_size
23 Downloads / 190 Views

First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function Patrick Kern1 · Axel Simroth2 · Henryk Zähle1 Received: 23 January 2019 / Revised: 2 September 2019 © The Author(s) 2020

Abstract Markov decision models (MDM) used in practical applications are most often less complex than the underlying ‘true’ MDM. The reduction of model complexity is performed for several reasons. However, it is obviously of interest to know what kind of model reduction is reasonable (in regard to the optimal value) and what kind is not. In this article we propose a way how to address this question. We introduce a sort of derivative of the optimal value as a function of the transition probabilities, which can be used to measure the (first-order) sensitivity of the optimal value w.r.t. changes in the transition probabilities. ‘Differentiability’ is obtained for a fairly broad class of MDMs, and the ‘derivative’ is specified explicitly. Our theoretical findings are illustrated by means of optimization problems in inventory control and mathematical finance. Keywords Markov decision model · Model reduction · Transition probability function · Optimal value · Functional differentiability · Financial optimization

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00186020-00706-w) contains supplementary material, which is available to authorized users.

B

Henryk Zähle [email protected] Patrick Kern [email protected] Axel Simroth [email protected]

1

Department of Mathematics, Saarland University, Saarbrücken, Germany

2

Fraunhofer Institute for Transportation and Infrastructure Systems, Dresden, Germany

123

P. Kern et al.

1 Introduction Already in the 1990th, Müller (1997a) pointed out that the impact of the transition probabilities of a Markov decision process (MDP) on the optimal value of a corresponding Markov decision model (MDM) can not be ignored for practical issues. For instance, in most cases the transition probabilities are unknown and have to be estimated by statistical methods. Moreover in many applications the ‘true’ model is replaced by an approximate version of the ‘true’ model or by a variant which is simplified and thus less complex. The result is that in practical applications the optimal (strategy and thus the optimal) value is most often computed on the basis of transition probabilities that differ from the underlying true transition probabilities. Therefore the sensitivity of the optimal value w.r.t. deviations in the transition probabilities is obviously of interest. Müller (1997a) showed that under some structural assumptions the optimal value in a discrete-time MDM depends continuously on the transition probabilities, and he established bounds for the approximation error. In the course of this the distance between transition probabilities was measured by means of some suitable probability metrics. Even earlier, Kolonko (1983) obtained analogous bounds in a MDM in which the

Data Loading...

First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition pro

Recommend Documents

Multi-sensitivity with respect to a vector for semiflows

Markov Decision Processes with Applications to Finance

The Value in Procreation: A Pro-tanto Case for a Limited and Conditional Right to Procreate

Markov Decision Processes in Practice

Attainability of optimal solutions to a linear problem of multicriterion optimization with respect to the weighed sum of

Safety of links with respect to the Myerson value for communication situations

Complexity of the TDNN Acoustic Model with Respect to the HMM Topology

Brittle to ductile transition dependence upon the transition pressure value of semiconductors in micromachining

Markov Decision Processes With Their Applications

Limit Synchronization in Markov Decision Processes

Band to Mott Insulator Transition in the Ionic Hubbard Model

Using a Markov process model of an association football match to determine the optimal timing of substitution and tactic