The Catchment Feature Model: A Device for Multimodal Fusion and a Bridge between Signal and Sense

PDF / 1,133,411 Bytes
18 Pages / 600 x 792 pts Page_size
100 Downloads / 230 Views

The Catchment Feature Model: A Device for Multimodal Fusion and a Bridge between Signal and Sense Francis Quek Vision Interfaces and Systems Laboratory, Center for Human Computer Interaction, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA Email: [email protected] Received 24 October 2002; Revised 16 February 2004 The catchment feature model addresses two questions in the field of multimodal interaction: how we bridge video and audio processing with the realities of human multimodal communication, and how information from the diﬀerent modes may be fused. We argue from a detailed literature review that gestural research has clustered around manipulative and semaphoric use of the hands, motivate the catchment feature model psycholinguistic research, and present the model. In contrast to “whole gesture” recognition, the catchment feature model applies a feature decomposition approach that facilitates cross-modal fusion at the level of discourse planning and conceptualization. We present our experimental framework for catchment feature-based research, cite three concrete examples of catchment features, and propose new directions of multimodal research based on the model. Keywords and phrases: multimodal interaction, gesture interaction, multimodal communications, motion symmetries, gesture space use.

1.

INTRODUCTION

The importance of gestures of hand, head, face, eyebrows, eye, and body posture in human communication in conjunction with speech is self-evident. This paper advances a device known as the “catchment” [1, 2, 3] and the concept of a “catchment feature” that unifies what can reasonably be extracted from video imagery with human discourse. The catchment feature model also serves as the basis for multimodal fusion at this level of discourse conceptualization. This represents a new direction for gesture and speech analysis that makes each indispensable to the other. To this end, this paper will contextualize the engineering research in human gestures by a detailed literature analysis, advance the catchment feature model that facilitates a decomposed feature approach, present an experimental framework for catchment feature-based research, list examples that demonstrate the eﬀectiveness of the concept, and propose directions for the field to realize the broader vision of computational multimodal discourse understanding. 2.

OF MANIPULATION AND SEMAPHORES

In [4], we argue that with respect to human computer interaction (HCI), the bulk of the engineering-based gesture research may be classified as either manipulative or semaphoric. The former follows the tradition of Bolt’s “Put-

That-There” system [5, 6] which permits the direct manipulation of entities in a system. We extend the concept to cover all systems of direct control such as “finger flying” to navigate virtual spaces, control of appliances and games, and robot control in this category. The essential characteristic of manipulative systems is the tight feedback between the gesture and the entity being controlled. Semaphore gest

Data Loading...

The Catchment Feature Model: A Device for Multimodal Fusion and a Bridge between Signal and Sense

Recommend Documents

A deep multimodal generative and fusion framework for class-imbalanced multimodal data

Multimodal Fusion

A System Architecture for Heterogeneous Signal Data Fusion, Integrity Monitoring and Estimation of Signal Quality

Multi-Device for Signal

HFF: Hybrid Feature Fusion Model for Click-Through Rate Prediction

A feature fusion-based prognostics approach for rolling element bearings

A Feature Fusion Network for Multi-modal Mesoscale Eddy Detection

A Semantically Flexible Feature Fusion Network for Retinal Vessel Segmentation

Feature Fusion

S-adic Sequences: A Bridge Between Dynamics, Arithmetic, and Geometry

LTBP1 plays a potential bridge between depressive disorder and glioblastoma

Multimodal human eye blink recognition method using feature level fusion for exigency detection