Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition

  • PDF / 412,741 Bytes
  • 7 Pages / 600.03 x 792 pts Page_size
  • 78 Downloads / 182 Views

DOWNLOAD

REPORT


Research Article Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition Geng-Xin Ning, Gang Wei, and Kam-Keung Chu School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China Received 8 October 2005; Revised 20 December 2006; Accepted 20 December 2006 Recommended by Douglas O’Shaughnessy This paper presents a novel model compensation (MC) method for the features of mel-frequency cepstral coefficients (MFCCs) with signal-to-noise-ratio- (SNR-) dependent nonuniform spectral compression (SNSC). Though these new MFCCs derived from a SNSC scheme have been shown to be robust features under matched case, they suffer from serious mismatch when the reference models are trained at different SNRs and in different environments. To solve this drawback, a compressed mismatch function is defined for the static observations with nonuniform spectral compression. The means and variances of the static features with spectral compression are derived according to this mismatch function. Experimental results show that the proposed method is able to provide recognition accuracy better than conventional MC methods when using uncompressed features especially at very low SNR under different noises. Moreover, the new compensation method has a computational complexity slightly above that of conventional MC methods. Copyright © 2007 Geng-Xin Ning et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The problem of achieving robust speech recognition in noisy environments has aroused much interest in the past decades. However, drastic degradation of performance may still occur when a recognizer operates under noisy circumstances. Resolutions to this problem can be generally divided into three categories: inherently robust feature representation [1], speech enhancement schemes [2], and model-based compensation [3–6]. More details are reviewed in [7]. Recently, different speech analyses based on psychoacoustics have been reported in the literature [8]. The well-known perceptual linear prediction (PLP) [9] uses critical band filtering followed by equal-loudness pre-emphasis to simulate, respectively, the frequency resolution and frequency sensitivity of the auditory system. Cubic-root spectral magnitude compression with a fixed compression root is subsequently used to approximate the intensity-to-loudness conversion. However, it is suboptimal to use a constant root for compressing all the filter bank outputs, because employing a constant compression root would over-compress some outputs and under-compress other outputs at the same time.

A new kind of noise-resistant feature by employing a SNR-dependent nonuniform spectral compression scheme was presented in [1], which compress the corrupted speech spectrum by a SNR-dependent root value. [1] has shown that the SNSC deri