Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition

PDF / 412,741 Bytes
7 Pages / 600.03 x 792 pts Page_size
78 Downloads / 217 Views

Research Article Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition Geng-Xin Ning, Gang Wei, and Kam-Keung Chu School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510640, China Received 8 October 2005; Revised 20 December 2006; Accepted 20 December 2006 Recommended by Douglas O’Shaughnessy This paper presents a novel model compensation (MC) method for the features of mel-frequency cepstral coeﬃcients (MFCCs) with signal-to-noise-ratio- (SNR-) dependent nonuniform spectral compression (SNSC). Though these new MFCCs derived from a SNSC scheme have been shown to be robust features under matched case, they suﬀer from serious mismatch when the reference models are trained at diﬀerent SNRs and in diﬀerent environments. To solve this drawback, a compressed mismatch function is defined for the static observations with nonuniform spectral compression. The means and variances of the static features with spectral compression are derived according to this mismatch function. Experimental results show that the proposed method is able to provide recognition accuracy better than conventional MC methods when using uncompressed features especially at very low SNR under diﬀerent noises. Moreover, the new compensation method has a computational complexity slightly above that of conventional MC methods. Copyright © 2007 Geng-Xin Ning et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The problem of achieving robust speech recognition in noisy environments has aroused much interest in the past decades. However, drastic degradation of performance may still occur when a recognizer operates under noisy circumstances. Resolutions to this problem can be generally divided into three categories: inherently robust feature representation [1], speech enhancement schemes [2], and model-based compensation [3–6]. More details are reviewed in [7]. Recently, diﬀerent speech analyses based on psychoacoustics have been reported in the literature [8]. The well-known perceptual linear prediction (PLP) [9] uses critical band filtering followed by equal-loudness pre-emphasis to simulate, respectively, the frequency resolution and frequency sensitivity of the auditory system. Cubic-root spectral magnitude compression with a fixed compression root is subsequently used to approximate the intensity-to-loudness conversion. However, it is suboptimal to use a constant root for compressing all the filter bank outputs, because employing a constant compression root would over-compress some outputs and under-compress other outputs at the same time.

A new kind of noise-resistant feature by employing a SNR-dependent nonuniform spectral compression scheme was presented in [1], which compress the corrupted speech spectrum by a SNR-dependent root value. [1] has shown that the SNSC deri

Data Loading...

Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition

Recommend Documents

Fisher Kernels on Phase-Based Features for Speech Emotion Recognition

Multi-features Integration for Speech Emotion Recognition

Pattern recognition and features selection for speech emotion recognition model using deep learning

Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems

Motion Compensation for Video Compression

Motion Compensation for Video Compression

An efficient retrieval approach for encrypted speech based on biological hashing and spectral subtraction

Speech Compression

Blurring Detection Based on Selective Features for Iris Recognition

Nonuniform Segment-Based Compression of Motion Capture Data

Research on a software architecture of speech recognition and detection based on interactive reconstruction model

Speech Emotion Recognition Using Spectrogram Patterns as Features