A Vision-Based Architecture for Intent Recognition

Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or detection of

  • PDF / 2,194,317 Bytes
  • 10 Pages / 430 x 660 pts Page_size
  • 75 Downloads / 257 Views

DOWNLOAD

REPORT


Abstract. Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or detection of situations that can pose a particular threat. We propose an approach that allows a physical robot to detect the intentions of others based on experience acquired through its own sensory-motor abilities. It uses this experience while taking the perspective of the agent whose intent should be recognized. The robot’s capability to observe and analyze the current scene employs a novel vision-based technique for target detection and tracking, using a non-parametric recursive modeling approach. Our intent recognition method uses a novel formulation of Hidden Markov Models (HMM’s) designed to model a robot’s experience and its interaction with the world while performing various actions.

1

Introduction

The ability to understand the intent of others is critical for the success of communication and collaboration between people. The general principle of understanding intentions that we propose in this work is inspired from psychological evidence of a Theory of Mind [1], which states that people have a mechanism for representing, predicting and interpreting each other’s actions. This mechanism, based on taking the perspective of others [2], gives people the ability to infer the intentions and goals that underlie action [3]. We base our work on these findings and we take an approach that uses the observer’s own learned experience to detect the intentions of the agent or agents it observes. When matched with our own past experiences, these sensory observations become indicative of what our intentions would be in the same situation. The proposed system models the interactions with the world, acquired from visual information. This information is used in a novel formulation of Hidden Markov Models (HMMs) adapted to suit our needs. The distinguishing feature in our HMMs is that they model not only transitions between discrete states, but also the way in which the parameters encoding the goals of an activity change G. Bebis et al. (Eds.): ISVC 2007, Part II, LNCS 4842, pp. 173–182, 2007. c Springer-Verlag Berlin Heidelberg 2007 

174

A. Tavakkoli et al.

during its performance. This novel formulation of the HMM representation allows for recognition of the agents’ intent well before the actions are finalized. Our approach is composed of two modules: the Vision module and the HMM module. The vision module performs low-level processing on video frames such as detection and tracking of objects of interest. The detected objects are further processed in the vision module and their 3D positions, distances, and angles are generated. This mid-level information is finally used in the HMM module to perform the two main stages: the activity modeling and the intent recognition. During the first stage the robot learns corresponding HMM’s for each activity it should later recognize. Du