Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

PDF / 472,817 Bytes
9 Pages / 600.03 x 792 pts Page_size
45 Downloads / 197 Views

Research Article Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition Youngjoo Suh, Sungtak Kim, and Hoirin Kim School of Engineering, Information and Communications University, 119 Munjiro, Daejeon 305-732, Yuseong-Gu, South Korea Received 1 February 2006; Revised 26 November 2006; Accepted 1 February 2007 Recommended by Mark Gales A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and the nonmonotonic transformation caused by the acoustic mismatch. The algorithm employs multiple class-specific reference and test cumulative distribution functions, classifies noisy test features into their corresponding classes, and equalizes the features by using their corresponding class reference and test distributions. The minimum mean-square error log-spectral amplitude (MMSE-LSA)-based speech enhancement is added just prior to the baseline feature extraction to reduce the corruption by additive noise. The experiments on the Aurora2 database proved the eﬀectiveness of the proposed method by reducing relative errors by 62% over the mel-cepstral-based features and by 23% over the conventional histogram equalization method, respectively. Copyright © 2007 Youngjoo Suh et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The performance of automatic speech recognition (ASR) systems degrades severely when they are employed in acoustically mismatched environments compared to the training ones. The main cause of this acoustic mismatch is corruption by additive noise and channel distortion, both of which are commonly encountered adverse sources in the real-world ASR applications. To cope with this problem, robust speech recognition has become one of the most crucial issues in the research area of speech recognition. Currently, most robust speech recognition methods can be categorized into the following three areas: signal space, feature space, and model space [1]. Compared to the other two categories, the feature space approach has also been widely employed due to advantages such as easy implementation, low computational complexity, and eﬀective performance improvements. Acoustic environments corrupted by additive noise and channel distortion act as a nonlinear transformation in the feature spaces of the cepstrum or log-spectrum [2]. Thus, classical linear feature space methods such as cepstral mean subtraction or cepstral mean and variance normalization have substantial limitations even though they yield

significant performance improvements under noisy environments [3

Data Loading...

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Recommend Documents

Adaptive Bi-Histogram Equalization Using Threshold (ABHET)

Advanced Comb Filtering for Robust Speech Recognition

Stress and Emotion Recognition Using Acoustic Speech Analysis

A Robust Multimodal Speech Recognition Method using Optical Flow Analysis

Federated Acoustic Model Optimization for Automatic Speech Recognition

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

New Era for Robust Speech Recognition Exploiting Deep Learning

Real-Time Contrast Enhancement for 3D Medical Images Using Histogram Equalization

Medical reporting using speech recognition

Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions

Robust Adaptation to Non-Native Accents in Automatic Speech Recognition

Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation