A Vision-Based Architecture for Intent Recognition

Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or detection of

PDF / 2,194,317 Bytes
10 Pages / 430 x 660 pts Page_size
75 Downloads / 281 Views

DOWNLOAD

REPORT

Abstract. Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or detection of situations that can pose a particular threat. We propose an approach that allows a physical robot to detect the intentions of others based on experience acquired through its own sensory-motor abilities. It uses this experience while taking the perspective of the agent whose intent should be recognized. The robot’s capability to observe and analyze the current scene employs a novel vision-based technique for target detection and tracking, using a non-parametric recursive modeling approach. Our intent recognition method uses a novel formulation of Hidden Markov Models (HMM’s) designed to model a robot’s experience and its interaction with the world while performing various actions.

1

Introduction

The ability to understand the intent of others is critical for the success of communication and collaboration between people. The general principle of understanding intentions that we propose in this work is inspired from psychological evidence of a Theory of Mind [1], which states that people have a mechanism for representing, predicting and interpreting each other’s actions. This mechanism, based on taking the perspective of others [2], gives people the ability to infer the intentions and goals that underlie action [3]. We base our work on these ﬁndings and we take an approach that uses the observer’s own learned experience to detect the intentions of the agent or agents it observes. When matched with our own past experiences, these sensory observations become indicative of what our intentions would be in the same situation. The proposed system models the interactions with the world, acquired from visual information. This information is used in a novel formulation of Hidden Markov Models (HMMs) adapted to suit our needs. The distinguishing feature in our HMMs is that they model not only transitions between discrete states, but also the way in which the parameters encoding the goals of an activity change G. Bebis et al. (Eds.): ISVC 2007, Part II, LNCS 4842, pp. 173–182, 2007. c Springer-Verlag Berlin Heidelberg 2007

174

A. Tavakkoli et al.

during its performance. This novel formulation of the HMM representation allows for recognition of the agents’ intent well before the actions are ﬁnalized. Our approach is composed of two modules: the Vision module and the HMM module. The vision module performs low-level processing on video frames such as detection and tracking of objects of interest. The detected objects are further processed in the vision module and their 3D positions, distances, and angles are generated. This mid-level information is ﬁnally used in the HMM module to perform the two main stages: the activity modeling and the intent recognition. During the ﬁrst stage the robot learns corresponding HMM’s for each activity it should later recognize. Du

Data Loading...

A Vision-Based Architecture for Intent Recognition

Recommend Documents

Designing for Intent-to-Treat

Binarized Neural Architecture Search for Efficient Object Recognition

Legislative Intent

Query Intent Understanding

A Collective Theory of Genocidal Intent

Face Sketch-Image Recognition for Criminal Detection Using a GAN Architecture

A Novel CNN Architecture for Real-Time Point Cloud Recognition in Road Environment

Arabic (Indian) digit handwritten recognition using recurrent transfer deep architecture

Quality Architecture: A Concept for Quality Manufacturing

A Digital Product Memory Architecture for Cars

D-GHNAS for Joint Intent Classification and Slot Filling

Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment