Bayesian Inverse Reinforcement Learning for Modeling Conversational Agents in a Virtual Environment

This work proposes a Bayesian approach to learn the behavior of human characters that give advice and help users to complete tasks in a situated environment. We apply Bayesian Inverse Reinforcement Learning (BIRL) to infer this behavior in the context of

PDF / 446,848 Bytes
12 Pages / 439.363 x 666.131 pts Page_size
99 Downloads / 269 Views

DOWNLOAD

REPORT

Universit´e de Lorraine/LORIA, Nancy 2 CNRS/LORIA, Nancy {lina.rojas,christophe.cerisara}@loria.fr

Abstract. This work proposes a Bayesian approach to learn the behavior of human characters that give advice and help users to complete tasks in a situated environment. We apply Bayesian Inverse Reinforcement Learning (BIRL) to infer this behavior in the context of a serious game, given evidence in the form of stored dialogues provided by experts who play the role of several conversational agents in the game. We show that the proposed approach converges relatively quickly and that it outperforms two baseline systems, including a dialogue manager trained to provide “locally” optimal decisions.

1 Introduction Reinforcement Learning (RL) has been widely used for learning dialogue strategies [1–5]. Dialogues are modeled as an optimization problem, simulating the inherent dynamic behavior of conversations in order to find the globally optimal policy. However, the RL problem assumes the reward function is known. Indeed the reward function is usually handcrafted, as pointed out in [6], “the reward function is almost always set by intuition, not data”. Inverse reinforcement learning (IRL) has been defined in [7] as the problem of recovering the reward function from experts’ demonstrations. It tries to find an optimal reward, which leads to a decision policy that follows as closely as possible the examples provided by experts maximizing the expected cumulated reward in the long-run. In this work we explore Bayesian Inverse Reinforcement Learning (BIRL) [8] to infer the reward function from humans who perform the task of instructing players in a serious game. We also apply the improvements to BIRL proposed in [9], namely the Modified BIRL (MBIRL), in order to reduce the computational complexity in large state spaces. This work covers a first step towards dialogue optimization with user simulation. Therefore, instead of designing in advance the reward function to “properly instruct players”, which is a difficult and subjective task, we rather propose to learn it from humans. Once we have found the reward function we can apply classical reinforcement learning with user simulation for building a dialogue system and afterwards testing it with real users. The adapted Bayesian approach is evaluated in terms of policy loss [9] and is compared against two baselines. The first one uses random rewards, while the second one exploits corpus-estimated locally-optimal rewards (i.e., supervised learning). The results show that the proposed approach converges relatively quickly and consistently A. Gelbukh (Ed.): CICLing 2014, Part I, LNCS 8403, pp. 503–514, 2014. c Springer-Verlag Berlin Heidelberg 2014

504

L.M. Rojas-Barahona and C. Cerisara

outperforms both baselines, which confirms that taking into account the dynamic properties of the environment leads to virtual characters that better reproduce the behavior of experts. Qualitatively, our models have thus learned to adequately inform users and provide help when needed.

2 Reinforcem

Data Loading...

Bayesian Inverse Reinforcement Learning for Modeling Conversational Agents in a Virtual Environment

Recommend Documents

Conversational Agents

Embodied Conversational Agents

Reinforcement Learning-Based Redirection Controller for Efficient Redirected Walking in Virtual Maze Environment

Inter-university Virtual Learning Environment

Interacting with Embodied Conversational Agents

Designing Anthropomorphic Enterprise Conversational Agents

SciEthics Interactive: science and ethics learning in a virtual environment

Conversational Recommender System by Bayesian Methods

Integrating Virtual Reality in a Lab Based Learning Environment

Conversational ontology operator: patient-centric vaccine dialogue management engine for spoken conversational agents

Job placement using reinforcement learning in GPU virtualization environment

At Home with Alexa: A Tale of Two Conversational Agents