Markov Decision Processes with Multiple Long-Run Average Objectives

We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the di

PDF / 477,898 Bytes
12 Pages / 430 x 660 pts Page_size
2 Downloads / 251 Views

DOWNLOAD

REPORT

Abstract. We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-oﬀs between the diﬀerent objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be ε-approximated by a memoryless strategy, for all ε > 0. In contrast to the single-objective case, the memoryless strategy may require randomization. We show that the Pareto curve can be approximated (a) in polynomial time in the size of the MDP for irreducible MDPs; and (b) in polynomial space in the size of the MDP for all MDPs. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time for irreducible MDPs and in NP for all MDPs. These results provide algorithms for design exploration in MDP models with multiple long-run average objectives.

1

Introduction

Markov decision processes (MDPs) are standard models for dynamic systems that exhibit both probabilistic and nondeterministic behaviors [11,5]. An MDP models a dynamic system that evolves through stages. In each stage, a controller chooses one of several actions (the nondeterministic choices), and the system stochastically evolves to a new state based on the current state and the chosen action. In addition, one associates a cost or reward with each transition, and the central question is to ﬁnd a strategy of choosing the actions that optimizes the rewards obtained over the run of the system. The two classical ways of combing the rewards over the run of the system are as follows: (a) the discounted sum of the rewards and (b) the long-run average of the rewards. In many modeling domains, however, there is no unique objective to be optimized, but multiple, potentially dependent and conﬂicting objectives. For example, in designing a computer system, one is interested not only in maximizing performance but also in minimizing power. Similarly, in an inventory management system, one wishes to optimize several potentially dependent costs for maintaining each kind of product, and in AI planning, one wishes to ﬁnd a plan that optimizes several distinct goals. These motivate the study of MDPs with multiple objectives.

This research was supported by the NSF grants CCR-0225610 and CCR-0234690.

V. Arvind and S. Prasad (Eds.): FSTTCS 2007, LNCS 4855, pp. 473–484, 2007. c Springer-Verlag Berlin Heidelberg 2007

474

K. Chatterjee

We study MDPs with multiple long-run average objectives, an extension of the MDP model where there are several reward functions [7,13]. In MDPs with multiple objectives, we are interested not in a single solution that is simultaneously optimal in all objectives (which may not exist), but in a notion of “trade-oﬀs” called the Pareto curve. Informally, the Pareto curve consists of the set of realizable value proﬁles (or dually, the strategies that realize them) that are not dominated (in every dimensio

Data Loading...

Markov Decision Processes with Multiple Long-Run Average Objectives

Recommend Documents

Markov Decision Processes

Markov Decision Processes

Competitive Markov Decision Processes

Markov Decision Processes With Their Applications

Markov Decision Processes with Applications to Finance

Discounted Markov decision processes with fuzzy costs

Markov decision processes with quasi-hyperbolic discounting

Multiple Objectives in Dynamic Decision Making

Partially Observable Markov Decision Processes

Markov Decision Processes in Practice

Limit Synchronization in Markov Decision Processes

Simulation-Based Algorithms for Markov Decision Processes