An Overcomplete Signal Basis Approach to Nonlinear Time-Tone Analysis with Application to Audio and Speech Processing

PDF / 237,594 Bytes
5 Pages / 600.03 x 792 pts Page_size
6 Downloads / 252 Views

An Overcomplete Signal Basis Approach to Nonlinear Time-Tone Analysis with Application to Audio and Speech Processing Richard B. Reilly School of Electrical, Electronic and Mechanical Engineering, University College Dublin, Belfield, Dublin 4, Ireland Received 23 August 2004; Revised 22 March 2005; Accepted 25 March 2005 Although a beating tone and the two pure tones which give rise to it are linearly dependent, the ear considers them to be independent as tone sensations. A linear time-frequency representation of acoustic data is unable to model these phenomena. A time-tone sensation approach is proposed for inclusion within audio analysis systems. The proposed approach extends linear time-frequency analysis of acoustic data, by accommodating the nonlinear phenomenon of beats. The method replaces the one-dimensional tonotopic axis of linear time-frequency analysis with a two-dimensional tonotopic plane, in which one direction corresponds to tone, and the other to its frequency of modulation. Some applications to audio prostheses are discussed. The proposed method relies on an intuitive criterion of optimal representation which can be applied to any overcomplete signal basis, allowing for many signal processing applications. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION

Speech recognition is a hierarchical process consisting of four main phases, audio analysis, speech feature extraction, pattern classification, and language processing [1, 2]. The audio analysis phase relies on a mathematical model of the human cochlea as a frequency analyzer. To a first approximation, the cochlea is an array of overlapping linear bandpass filters. This model dates back to Von Helmholtz in 1863 [3] and still plays an important role in audio signal processing systems. But there is clear evidence that the cochlea is in fact inherently nonlinear and this nonlinearity is not just a result of overloading it at high signal levels. As a result, the overlapping linear filter model fails to account for essentially nonlinear phenomena of human audition, such as masking, beats, and the sensation of Tartini or combination tones [4, 5]. Masking is an eﬀect whereby the threshold of hearing of a test tone is increased in the presence of a masking tone. This threshold is dependent on the frequency separation of the test and masking tones, and tones outside of this critical band having little influence on the threshold. A tone close to the masking tone causes the two tones to interfere in the form of beats [5]. An example of the sensation of combination or Tartini tones occurs when a listener on being presented with two pure tones hears a third tone which actually

is not present. A linear frequency-analysis model of audition cannot account for these phenomena. The basilar membrane within the cochlea detects the component frequencies, or tones, of incoming sound. Due to the flexible nature of the membrane, incoming vibrations set up a travelling wave along the membrane giving it a diﬀerent maximum displacement

Data Loading...

An Overcomplete Signal Basis Approach to Nonlinear Time-Tone Analysis with Application to Audio and Speech Processing

Recommend Documents

Speech, Audio, Image and Biomedical Signal Processing using Neural Networks

Audio-Visual Speech Processing

Immersive Audio Signal Processing

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Progress in Nonlinear Speech Processing

An extension of the structured singular value to nonlinear systems with application to robust flutter analysis

Application of Feature Extraction in Text-to-Speech Processing

Active Signal Processing: A Counter-intuitive Approach to Enhancing Signal-to-Noise Ratio via Noise Injection

Audio Processing and Speech Recognition Concepts, Techniques and Res

Formalization of Ternary Logic for Application to Digital Signal Processing

Perceptual Models for Speech, Audio, and Music Processing

Social Multimedia Signals A Signal Processing Approach to Social Net