A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

PDF / 1,146,617 Bytes
15 Pages / 600.03 x 792 pts Page_size
58 Downloads / 233 Views

Research Article A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition Kris Hermus, Patrick Wambacq, and Hugo Van hamme Department of Electrical Engineering - ESAT, Katholieke Universiteit Leuven, 3001 Leuven-Heverlee, Belgium Received 24 October 2005; Revised 7 March 2006; Accepted 30 April 2006 Recommended by Kostas Berberidis The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser’s back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique. Copyright © 2007 Kris Hermus et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

One particular class of speech enhancement techniques that has gained a lot of attention is signal subspace filtering. In this approach, a nonparametric linear estimate of the unknown clean-speech signal is obtained based on a decomposition of the observed noisy signal into mutually orthogonal signal and noise subspaces. This decomposition is possible under the assumption of a low-rank linear model for speech and an uncorrelated additive (white) noise interference. Under these conditions, the energy of less correlated noise spreads over the whole observation space while the energy of the correlated speech components is concentrated in a subspace thereof. Also, the signal subspace can be recovered consistently from the noisy data. Generally speaking, noise reduction is obtained by nulling the noise subspace and by removing the noise contribution in the signal subspace. The idea to perform subspace-based signal estima

Data Loading...

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Recommend Documents

Blind Signal Separation with Speech Enhancement

Detection and Separation of Speech Event Using Audio and Video Information Fusion and Its Application to Robust Speech I

Advanced Comb Filtering for Robust Speech Recognition

Robust Adaptation to Non-Native Accents in Automatic Speech Recognition

Fundamentals of Speech Enhancement

Speech Recognition

A Robust Multimodal Speech Recognition Method using Optical Flow Analysis

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech p

Time-Varying Noise Estimation for Speech Enhancement and Recognition Using Sequential Monte Carlo Method

Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

Real-time speech enhancement algorithm for transient noise suppression

Speech-to-Speech Translation