Perceptual Models for Speech, Audio, and Music Processing

  • PDF / 124,950 Bytes
  • 2 Pages / 600.05 x 792 pts Page_size
  • 9 Downloads / 272 Views

DOWNLOAD

REPORT


Editorial Perceptual Models for Speech, Audio, and Music Processing Jont B. Allen,1 Wai-Yip Geoffrey Chan,2 and Stephen Voran3 1 Beckman

Institute, University of Illinois, 405 North Mathews Avenue, Urbana, IL 61801, USA and Computer Engineering Department, Queen’s University, 99 University Avenue, Kingston, ON, Canada K7L 3N6 3 Institute for Telecommunication Sciences, 325 Broadway, Boulder, CO 80305, USA 2 Electrical

Received 22 November 2007; Accepted 22 November 2007 Copyright © 2007 Jont B. Allen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

New understandings of human auditory perception have recently contributed to advances in numerous areas related to audio, speech, and music processing. These include coding, speech and speaker recognition, synthesis, signal separation, signal enhancement, automatic content identification and retrieval, and quality estimation. Researchers continue to seek more detailed, accurate, and robust characterizations of human auditory perception, from the periphery to the auditory cortex, and in some cases whole brain inventories. This special issue on Perceptual Models for Speech, Audio, and Music Processing contains seven papers that exemplify the breadth and depth of current work in perceptual modeling and its applications. The issue opens with “Practical gammatone-like filters for auditory processing” by A. G. Katsiamis et al.which contains a nice review on how to make cochlear-like filters using classical signal processing methods. As described in the paper, the human cochlea is nonlinear. The nonlinearity in the cochlea is believed to control for dynamic range issues, perhaps due to the small dynamic range of neurons. Having a time domain version of the cochlea with a built in nonlinearity is an important tool in many signal processing applications. This paper shows one way this might be accomplished using a cascade of second-order sections. While we do not know how the human cochlea accomplishes this task of nonlinear filtering, the technique described here is one reasonable method for solving this very difficult problem. B. Raj et al.apply perceptual modeling to the automatic speech recognition problem in “An FFT-based companding front end for noise-robust automatic speech recognition.” These authors describe efficient FFT-based processing that mimics two-tone suppression, which is a key attribute of simultaneous masking. This processing involves a bank of relatively wide filters, followed by a compressive nonlinearity, then relatively narrow filters, and finally an expansion stage. The net result is that strong spectral components tend to re-

duce the level of weaker neighboring spectral components, and this is a form of spectral peak enhancement. The authors apply this work as a preprocessor for a mel-cepstrum HMM-based automatic speech recognition algorithm and they demonstrate improved performance f