Interactive task learning via embodied corrective feedback

  • PDF / 4,951,311 Bytes
  • 45 Pages / 439.37 x 666.142 pts Page_size
  • 0 Downloads / 233 Views

DOWNLOAD

REPORT


Interactive task learning via embodied corrective feedback Mattias Appelgren1   · Alex Lascarides1 Published online: 27 September 2020 © The Author(s) 2020

Abstract This paper addresses a task in Interactive Task Learning (Laird et  al. IEEE Intell Syst 32:6–21, 2017). The agent must learn to build towers which are constrained by rules, and whenever the agent performs an action which violates a rule the teacher provides verbal corrective feedback: e.g. “No, red blocks should be on blue blocks”. The agent must learn to build rule compliant towers from these corrections and the context in which they were given. The agent is not only ignorant of the rules at the start of the learning process, but it also has a deficient domain model, which lacks the concepts in which the rules are expressed. Therefore an agent that takes advantage of the linguistic evidence must learn the denotations of neologisms and adapt its conceptualisation of the planning domain to incorporate those denotations. We show that by incorporating constraints on interpretation that are imposed by discourse coherence into the models for learning (Hobbs in On the coherence and structure of discourse, Stanford University, Stanford, 1985; Asher et al. in Logics of conversation, Cambridge University Press, Cambridge, 2003), an agent which utilizes linguistic evidence outperforms a strong baseline which does not. Keywords  Human robot interaction · Interactive learning · Knowledge representation and reasoning

1 Introduction The nascent field of Interactive Task Learning (ITL) aims to develop agents that can learn arbitrary new tasks through a combination of their own actions in the environment and an ongoing interaction with a teacher (see Laird et al. [41] for a recent survey). A current assumption for many AI systems is that any capabilities required can be programmed and trained prior to deployment. However, this assumption may be untenable for tasks that contain a vast array of contingencies. It is also problematic if the task is one where unforeseen changes to what constitutes successful behaviour can occur after an agent is deployed: for * Mattias Appelgren [email protected] Alex Lascarides [email protected] 1



School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, Scotland, UK

13

Vol.:(0123456789)

54  Page 2 of 45

Autonomous Agents and Multi-Agent Systems (2020) 34:54

instance, tasks where the set of possible options, or the specifications that govern correct behaviour, can change at any given time. Motivated by such issues, ITL seeks to create agents that can learn after they are deployed, through situated interactions which are natural to the human domain expert that they interact with. Although interaction can take many forms, such as demonstration through imitation or teleoperation [6], our interest lies in approaches that make use of natural language to teach agents. A common formulation of such a learning process is as a situated and extended discourse between teacher and agent, much like one