From demonstrations to task-space specifications. Using causal analysis to extract rule parameterization from demonstrat

  • PDF / 2,051,026 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 87 Downloads / 231 Views

DOWNLOAD

REPORT


(2020) 34:45

From demonstrations to task‑space specifications. Using causal analysis to extract rule parameterization from demonstrations Daniel Angelov1   · Yordan Hristov1 · Subramanian Ramamoorthy1

© The Author(s) 2020

Abstract Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human–robot interaction. In this work, we show that it is possible to learn generative models for distinct user behavioural types, extracted from human demonstrations, by enforcing clustering of preferred task solutions within the latent space. We use these models to differentiate between user types and to find cases with overlapping solutions. Moreover, we can alter an initially guessed solution to satisfy the preferences that constitute a particular user type by backpropagating through the learned differentiable models. An advantage of structuring generative models in this way is that we can extract causal relationships between symbols that might form part of the user’s specification of the task, as manifested in the demonstrations. We further parameterize these specifications through constraint optimization in order to find a safety envelope under which motion planning can be performed. We show that the proposed method is capable of correctly distinguishing between three user types, who differ in degrees of cautiousness in their motion, while performing the task of moving objects with a kinesthetically driven robot in a tabletop environment. Our method successfully identifies the correct type, within the specified time, in 99% [97.8–99.8] of the cases, which outperforms an IRL baseline. We also show that our proposed method correctly changes a default trajectory to one satisfying a particular user specification even with unseen objects. The resulting trajectory is shown to be directly implementable on a PR2 humanoid robot completing the same task. Keywords  Human–robot interaction · Robot learning · Explainability

This research is supported by the Engineering and Physical Sciences Research Council (EPSRC), as part of the CDT in Robotics and Autonomous Systems at Heriot-Watt University and The University of Edinburgh. Grant reference EP/L016834/1. * Daniel Angelov [email protected] 1



School of Informatics, University of Edinburgh, Edinburgh, United Kingdom

13

Vol.:(0123456789)

45  

Page 2 of 19

Autonomous Agents and Multi-Agent Systems

(2020) 34:45

Fig. 1  Example setup—the demonstrated task is to return the pepper shaker to its original location—next to the salt shaker. Deciding which objects to avoid when performing the task can be seen as conditioning on the user specifications, implicitly given during a demonstration phase

Fig. 2  1 Demonstrations that satisfy the user task specification maintain a distance from fragile objects (i.e. a wine glass), or fail to satisfy the specification by moving over sharp items. 2 An environment can have multiple clusters of valid trajectories in the latent space, conditioned on user type. 3 The validity of