Sequential Extensions of Causal and Evidential Decision Theory

causal decision theory evidential decision theory sequential

  • PDF / 481,023 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 80 Downloads / 152 Views

DOWNLOAD

REPORT


Abstract. Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The nondualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one. Keywords: Evidential decision theory · Causal decision theory ning · Causal graphical models · Dualism · Physicalism

1

· Plan-

Introduction

In artificial-intelligence problems an agent interacts sequentially with an environment by taking actions and receiving percepts [RN10]. This model is dualistic: the agent is distinct from the environment. It influences the environment only through its actions, and the environment has no other information about the agent. The dualism assumption is accurate for an algorithm that is playing chess, go, or other (video) games, which explains why it is ubiquitous in AI research. But often it is not true: real-world agents are embedded in (and computed by) the environment [OR12], and then a physicalistic model 1 is more appropriate. This distinction becomes relevant in multi-agent settings with similar agents, where each agent encounters ‘echoes’ of its own decision making. If the other agents are running the same source code, then the agents’ decisions are logically connected. This link can be used for uncoordinated cooperation [LFY+14]. Moreover, a physicalistic model is indispensable for self-reflection. If the agent is required to autonomously verify its integrity, and perform maintenance, repair, or upgrades, then the agent needs to be aware of its own functioning. For this, a reliable and accurate self-modeling is essential. Today, applications of this level of autonomy are mostly restricted to space probes distant from earth or robots navigating lethal situations, but in the future this might also become crucial for sustained self-improvement in generally intelligent agents [Yud08,Bos14,SF14a, RDT+15]. 1

Some authors also call this type of model materialistic or naturalistic.

c Springer International Publishing Switzerland 2015  T. Walsh (Ed.): ADT 2015, LNAI 9346, pp. 205–221, 2015. DOI: 10.1007/978-3-319-23114-3 13

206

T. Everitt et al.

hidden state s self-model

at

environment model μ

et

agent π

environment

Fig. 1. The physicalistic model. The hidden state s contains information about the agent that is unknown to it. The distribution μ is the agent’s (subjective) environment model, and π its (deterministic) policy. The agent models itself through the beliefs about (future) actions given by its environment model μ. Interaction with the environment at time step t occurs through an action at chosen by the agent and a percept et returned by the environment.

In the physicalistic model the agent is embedded