The Catchment Feature Model: A Device for Multimodal Fusion and a Bridge between Signal and Sense
- PDF / 1,133,411 Bytes
- 18 Pages / 600 x 792 pts Page_size
- 100 Downloads / 171 Views
The Catchment Feature Model: A Device for Multimodal Fusion and a Bridge between Signal and Sense Francis Quek Vision Interfaces and Systems Laboratory, Center for Human Computer Interaction, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA Email: [email protected] Received 24 October 2002; Revised 16 February 2004 The catchment feature model addresses two questions in the field of multimodal interaction: how we bridge video and audio processing with the realities of human multimodal communication, and how information from the different modes may be fused. We argue from a detailed literature review that gestural research has clustered around manipulative and semaphoric use of the hands, motivate the catchment feature model psycholinguistic research, and present the model. In contrast to “whole gesture” recognition, the catchment feature model applies a feature decomposition approach that facilitates cross-modal fusion at the level of discourse planning and conceptualization. We present our experimental framework for catchment feature-based research, cite three concrete examples of catchment features, and propose new directions of multimodal research based on the model. Keywords and phrases: multimodal interaction, gesture interaction, multimodal communications, motion symmetries, gesture space use.
1.
INTRODUCTION
The importance of gestures of hand, head, face, eyebrows, eye, and body posture in human communication in conjunction with speech is self-evident. This paper advances a device known as the “catchment” [1, 2, 3] and the concept of a “catchment feature” that unifies what can reasonably be extracted from video imagery with human discourse. The catchment feature model also serves as the basis for multimodal fusion at this level of discourse conceptualization. This represents a new direction for gesture and speech analysis that makes each indispensable to the other. To this end, this paper will contextualize the engineering research in human gestures by a detailed literature analysis, advance the catchment feature model that facilitates a decomposed feature approach, present an experimental framework for catchment feature-based research, list examples that demonstrate the effectiveness of the concept, and propose directions for the field to realize the broader vision of computational multimodal discourse understanding. 2.
OF MANIPULATION AND SEMAPHORES
In [4], we argue that with respect to human computer interaction (HCI), the bulk of the engineering-based gesture research may be classified as either manipulative or semaphoric. The former follows the tradition of Bolt’s “Put-
That-There” system [5, 6] which permits the direct manipulation of entities in a system. We extend the concept to cover all systems of direct control such as “finger flying” to navigate virtual spaces, control of appliances and games, and robot control in this category. The essential characteristic of manipulative systems is the tight feedback between the gesture and the entity being controlled. Semaphore gest
Data Loading...