Automatic Speechreading with Applications to Human-Computer Interfaces

PDF / 3,312,691 Bytes
20 Pages / 612 x 792 pts (letter) Page_size
87 Downloads / 235 Views

Automatic Speechreading with Applications to Human-Computer Interfaces Xiaozheng Zhang Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA Email: [email protected]

Charles C. Broun Motorola Human Interface Lab, Tempe, AZ 85284, USA Email: [email protected]

Russell M. Mersereau Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA Email: [email protected]

Mark A. Clements Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA Email: [email protected] Received 30 October 2001 and in revised form 31 July 2002 There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly eﬀective for improving recognition performance over a variety of acoustic noise levels. Keywords and phrases: automatic speechreading, visual feature extraction, Markov random fields, hidden Markov models, polynomial classifier, speech recognition, speaker verification.

1.

INTRODUCTION

In recent years there has been growing interest in introducing new modalities into human-computer interfaces (HCIs). Natural means of communicating between humans and computers using speech instead of a mouse and keyboard provide an attractive alternative for HCI. With this motivation much research has been carried out in automatic speech recognition (ASR). Mainstream speech recognition has focused almost exclusively on the acoustic signal. Although purely acoustic-based ASR systems yield excellent results in the laboratory environment, the recognition error can increase dramatically in the real world in the

presence of noise such as in a typical oﬃce environment with ringing telephones and noise from fans and human conversations. Noise robust methods using feature-normalization algorithms, microphone arrays, r

Data Loading...

Automatic Speechreading with Applications to Human-Computer Interfaces

Recommend Documents

Virtflex: Automatic Adaptation to NUMA Topology Change for OpenMP Applications

Automatic discovery of Web Query Interfaces using machine learning techniques

Prototyping of User Interfaces for Mobile Applications

Working with Interfaces

Automatic Performance Modeling of HPC Applications

Modulation Spaces With Applications to Pseudodifferential Operators

Fuzzy Semirings with Applications to Automata Theory

Complex Analysis with Applications to Number Theory

General Relativity With Applications to Astrophysics

Lie Sphere Geometry With Applications to Submanifolds

Markov Decision Processes with Applications to Finance

ASP with Applications to Mazes and Levels