Markov decision processes with quasi-hyperbolic discounting
- PDF / 1,041,998 Bytes
- 41 Pages / 439.37 x 666.142 pts Page_size
- 15 Downloads / 199 Views
Markov decision processes with quasi-hyperbolic discounting Anna Ja´skiewicz1
· Andrzej S. Nowak2
Received: 26 May 2019 / Accepted: 19 August 2020 © The Author(s) 2020
Abstract We study Markov decision processes with Borel state spaces under quasihyperbolic discounting. This type of discounting nicely models human behaviour, which is time-inconsistent in the long run. The decision maker has preferences changing in time. Therefore, the standard approach based on the Bellman optimality principle fails. Within a dynamic game-theoretic framework, we prove the existence of randomised stationary Markov perfect equilibria for a large class of Markov decision processes with transitions having a density function. We also show that randomisation can be restricted to two actions in every state of the process. Moreover, we prove that under some conditions, this equilibrium can be replaced by a deterministic one. For models with countable state spaces, we establish the existence of deterministic Markov perfect equilibria. Many examples are given to illustrate our results, including a portfolio selection model with quasi-hyperbolic discounting. Keywords Markov decision process · Markov perfect equilibrium · Stochastic economic growth Mathematics Subject Classification (2010) 60J20 · 91A10 · 91A13 · 91A25 · 91B51 · 91B62 · 91G10 · 91G80 JEL Classification C61 · C72 · C73 · G11 The authors acknowledge financial support from the National Science Centre, Poland: Grant 2016/23/B/ST1/00425
B A.S. Nowak
[email protected] A. Ja´skiewicz [email protected]
1
Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology, Wyspia´nskiego 27, 50-370 Wrocław, Poland
2
Faculty of Mathematics, Computer Science and Econometrics, University of Zielona Góra, Licealna 9, 65-417 Zielona Góra, Poland
A. Ja´skiewicz, A.S. Nowak
1 Introduction The discounted utility approach in dynamic decision making has been used since the beginning of modern economic theory; see e.g. Samuelson [59]. It is based on the assumption that the discount rate is constant over time. In that way, it is possible to compare outcomes occurring at different times by discounting future utility by some constant factor. A decision maker using high discount rates exhibits more impatience than one with low discount rates. It should be noted, however, that there is growing evidence to think that standard (geometric) discounting is not adequate in many real life situations; see e.g. Ainslie [2]. When discounting is non-standard, the decision maker becomes time-inconsistent, that is, a policy chosen as optimal at the beginning of the decision process is no longer optimal if it is considered as a policy in the process from some later point in time onwards. It is said that the decision maker possesses changing time preferences or that his utilities change over time. For example, consider a consumption/saving problem in discrete time. Suppose that the decision maker plans to save a lot tomorrow, but as tomorrow comes, he reconsiders his pre
Data Loading...