A Natural Language Argumentation Interface for Explanation Generation in Markov Decision Processes

A Markov Decision Process (MDP) policy presents, for each state, an action, which preferably maximizes the expected reward accrual over time. In this paper, we present a novel system that generates, in real time, natural language explanations of the optim

  • PDF / 262,439 Bytes
  • 14 Pages / 429.442 x 659.895 pts Page_size
  • 107 Downloads / 192 Views

DOWNLOAD

REPORT


Abstract. A Markov Decision Process (MDP) policy presents, for each state, an action, which preferably maximizes the expected reward accrual over time. In this paper, we present a novel system that generates, in real time, natural language explanations of the optimal action, recommended by an MDP while the user interacts with the MDP policy. We rely on natural language explanations in order to build trust between the user and the explanation system, leveraging existing research in psychology in order to generate salient explanations for the end user. Our explanation system is designed for portability between domains and uses a combination of domain specific and domain independent techniques. The system automatically extracts implicit knowledge from an MDP model and accompanying policy. This policy-based explanation system can be ported between applications without additional effort by knowledge engineers or model builders. Our system separates domain-specific data from the explanation logic, allowing for a robust system capable of incremental upgrades. Domain-specific explanations are generated through case-based explanation techniques specific to the domain and a knowledge base of concept mappings for our natural language model.

1 Introduction A Markov decision process (MDP) is a mathematical formalism which allows for long range planning in probabilistic environments [2, 15]. The work reported here uses fully observable, factored MDPs[3]. The fundamental concepts use by our system are generalizable to other MDP formalisms; we choose the factored MDP representation as it will allow us to expand our system to scenarios where we recommend a set of actions per time step. A policy for an MDP is a mapping of states to actions that defines a tree of possible futures, each with a probability and a utility. Unfortunately, this branching set of possible futures is a large object with many potential branches that is difficult to understand even for sophisticated users. The complex nature of possible futures and their probabilities prevents many end users from trusting, understanding, and implementing the plans generated from MDP policies [9]. Recommendations and plans generated by computers are not always trusted or implemented by end users of decision support systems. Distrust and misunderstanding are two of the most often user cited reasons for not following a recommended plan R.I. Brafman, F. Roberts, and A. Tsouki`as (Eds.): ADT 2011, LNAI 6992, pp. 42–55, 2011. c Springer-Verlag Berlin Heidelberg 2011 

A Natural Language Argumentation Interface for Explanation Generation

43

or action [13]. For a user unfamiliar with stochastic planning, the most troublesome part of existing explanation systems is the explicit use of probabilities, as humans are demonstrably bad at reasoning with probabilities [18]. Additionally, it is our intuition that the concept of a preordained probability of success or failure at a given endeavor discomforts the average user. Following the classifications of logical arguments and explanation