An Overcomplete Signal Basis Approach to Nonlinear Time-Tone Analysis with Application to Audio and Speech Processing

  • PDF / 237,594 Bytes
  • 5 Pages / 600.03 x 792 pts Page_size
  • 6 Downloads / 222 Views

DOWNLOAD

REPORT


An Overcomplete Signal Basis Approach to Nonlinear Time-Tone Analysis with Application to Audio and Speech Processing Richard B. Reilly School of Electrical, Electronic and Mechanical Engineering, University College Dublin, Belfield, Dublin 4, Ireland Received 23 August 2004; Revised 22 March 2005; Accepted 25 March 2005 Although a beating tone and the two pure tones which give rise to it are linearly dependent, the ear considers them to be independent as tone sensations. A linear time-frequency representation of acoustic data is unable to model these phenomena. A time-tone sensation approach is proposed for inclusion within audio analysis systems. The proposed approach extends linear time-frequency analysis of acoustic data, by accommodating the nonlinear phenomenon of beats. The method replaces the one-dimensional tonotopic axis of linear time-frequency analysis with a two-dimensional tonotopic plane, in which one direction corresponds to tone, and the other to its frequency of modulation. Some applications to audio prostheses are discussed. The proposed method relies on an intuitive criterion of optimal representation which can be applied to any overcomplete signal basis, allowing for many signal processing applications. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION

Speech recognition is a hierarchical process consisting of four main phases, audio analysis, speech feature extraction, pattern classification, and language processing [1, 2]. The audio analysis phase relies on a mathematical model of the human cochlea as a frequency analyzer. To a first approximation, the cochlea is an array of overlapping linear bandpass filters. This model dates back to Von Helmholtz in 1863 [3] and still plays an important role in audio signal processing systems. But there is clear evidence that the cochlea is in fact inherently nonlinear and this nonlinearity is not just a result of overloading it at high signal levels. As a result, the overlapping linear filter model fails to account for essentially nonlinear phenomena of human audition, such as masking, beats, and the sensation of Tartini or combination tones [4, 5]. Masking is an effect whereby the threshold of hearing of a test tone is increased in the presence of a masking tone. This threshold is dependent on the frequency separation of the test and masking tones, and tones outside of this critical band having little influence on the threshold. A tone close to the masking tone causes the two tones to interfere in the form of beats [5]. An example of the sensation of combination or Tartini tones occurs when a listener on being presented with two pure tones hears a third tone which actually

is not present. A linear frequency-analysis model of audition cannot account for these phenomena. The basilar membrane within the cochlea detects the component frequencies, or tones, of incoming sound. Due to the flexible nature of the membrane, incoming vibrations set up a travelling wave along the membrane giving it a different maximum displacement