Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

The maximum reachability probabilities in a Markov decision process can be computed using value iteration (VI). Recently, simulation-based heuristic extensions of VI have been introduced, such as bounded real-time dynamic programming (BRTDP), which often

PDF / 498,411 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
118 Downloads / 227 Views

DOWNLOAD

REPORT

l University of Munich, Munich, Germany [email protected] 2 Masaryk University, Brno, Czech Republic

Abstract. The maximum reachability probabilities in a Markov decision process can be computed using value iteration (VI). Recently, simulation-based heuristic extensions of VI have been introduced, such as bounded real-time dynamic programming (BRTDP), which often manage to avoid explicit analysis of the whole state space while preserving guarantees on the computed result. In this paper, we introduce a new class of such heuristics, based on Monte Carlo tree search (MCTS), a technique celebrated in various machine-learning settings. We provide a spectrum of algorithms ranging from MCTS to BRTDP. We evaluate these techniques and show that for larger examples, where VI is no more applicable, our techniques are more broadly applicable than BRTDP with only a minor additional overhead.

1

Introduction

Markov decision processes (MDP) [Put14] are a classical formalism for modelling systems with both non-deterministic and probabilistic behaviour. Although there are various analysis techniques for MDP that run in polynomial time and return precise results, such as those based on linear programming, they are rarely used. Indeed, dynamic programming techniques, such as value iteration (VI) or policy iteration, are usually preferred despite their exponential complexity. The reason is that for systems of sizes appearing in practice not even polynomial algorithms are useful and heuristics utilizing the structure of the humandesigned systems become a necessity. Consequently, probabilistic model checking has adopted [BCC+14,ACD+17,KM17,KKKW18,DJKV17] techniques, which generally come with weaker guarantees on correctness and running time, but in practice perform better. These techniques originate from reinforcement learning, such as delayed Q-learning [SLW+06], or probabilistic planning, such as bounded real-time dynamic programming (BRTDP) [MLG05]. This research was supported in part by Deutsche Forschungsgemeinschaft (DFG) through the TUM International Graduate School of Science and Engineering (IGSSE) project 10.06 PARSEC, the Czech Science Foundation grant No. 18-11193S, and the DFG project 383882557 Statistical Unbounded Verification. c Springer Nature Switzerland AG 2018 T. Margaria and B. Steﬀen (Eds.): ISoLA 2018, LNCS 11245, pp. 322–335, 2018. https://doi.org/10.1007/978-3-030-03421-4_21

Monte Carlo Tree Search for Verifying Reachability

323

Since veriﬁcation techniques are applied in safety-critical settings, the results produced by the techniques are required to be correct and optimal, or at least ε-optimal for a given precision ε. To this end, we follow the general scheme of [BCC+14] and combine the non-guaranteed heuristics with the traditional guaranteed techniques such as VI. However, while pure VI analyses the whole state space, the heuristics often allow us to focus only on a small part of it and still give precise estimates of the induced error. This approach was applied in [BCC+14], yielding a variant of

Data Loading...

Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

Recommend Documents

Monte Carlo tree search for materials design and discovery

Parallel Monte Carlo Tree Search in General Video Game Playing

Markov Chain Monte Carlo

Exploiting Game Decompositions in Monte Carlo Tree Search

Markov Decision Processes

Information gains from Monte Carlo Markov Chains

Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games

Markov Decision Processes

Competitive Markov Decision Processes

Markov Decision Processes in Practice

Markov chain Monte Carlo for active module identification problem

A Location Method for Reliability-Critical Paths Based on Monte Carlo Search Tree