Multi-Channel Sub-Band Speech Recognition

  • PDF / 828,763 Bytes
  • 8 Pages / 600 x 792 pts Page_size
  • 26 Downloads / 179 Views

DOWNLOAD

REPORT


ulti-Channel Sub-Band Speech Recognition Iain A. McCowan Speech Research Laboratory, RCSAVT, School of EESE, Queensland University of Technology, GPO Box 2434, Brisbane QLD 4001, Australia Email: [email protected]

Sridha Sridharan Speech Research Laboratory, RCSAVT, School of EESE, Queensland University of Technology, GPO Box 2434, Brisbane QLD 4001, Australia Email: [email protected] Received 22 December 2000 and in revised form 16 February 2001 Two distinct fields of research into robust speech recognition are the use of microphone arrays for signal enhancement and the use of independent frequency sub-band models for robust recognition. In this article, we propose and investigate the integration of these two techniques on two different levels. First, a broad-band beamforming microphone array allows for natural integration with sub-band speech recognition as the beamformer is implemented as a combination of band-limited sub-arrays. Rather than recombining the sub-array outputs to give a single enhanced output, we fuse the output of separate hidden Markov models trained on each sub-array frequency band. Second, a dynamic sub-band weighting algorithm is proposed in which the cross- and autospectral densities of the microphone inputs are used to estimate the reliability of each frequency band. The proposed multi-channel sub-band system is evaluated on an isolated digit recognition task and compared to both a standard full-band microphone array system and a single channel sub-band system. Keywords and phrases: microphone array, sub-band, beamforming, speech recognition.

1. INTRODUCTION An emerging area of research is the use of microphone arrays for the purpose of speech enhancement. In particular, microphone arrays have shown much promise in improving the performance of hands-free speech recognition systems in adverse environments [1, 2]. While such microphone array systems have shown good performance, potential for further improvement exists in closer integration of the multi-channel input with the speech recognition system. Brandstein [3] observes that while single channel speech enhancement and robust recognition techniques have sought to exploit various features of the speech signal, multi-channel techniques to date have primarily focused on improving the spatial filtering process. He suggests that some of the current limitations of the field could be addressed by researching multi-channel techniques based upon explicit modeling of speech characteristics. In this article, we investigate the integration of a subband based speech recognition system with a microphone array. Sub-band speech recognition is a relatively new field of research which has been shown to improve robustness to

noise where frequency bands are corrupted in a nonuniform manner [4, 5]. The sub-band approach is motivated by the psychoacoustic evidence that auditory processing decisions in humans are formed from the combination of independently processed frequency sub-bands [6, 7]. The proposed system integrates the microphone array with sub-band spe