Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model
- PDF / 1,281,828 Bytes
- 17 Pages / 600 x 792 pts Page_size
- 48 Downloads / 204 Views
Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model Thomas Lotter Institute of Communication Systems and Data Processing, RWTH Aachen University of Technology, RWTH Aachen, 52056 Aachen, Germany Siemens Audiological Engineering Group, Gebbertstrasse 125, 91058 Erlangen, Germany Email: [email protected]
Peter Vary Institute of Communication Systems and Data Processing, RWTH Aachen University of Technology, RWTH Aachen, 52056 Aachen, Germany Email: [email protected] Received 7 June 2004; Revised 17 September 2004; Recommended for Publication by Jacob Benesty This contribution presents two spectral amplitude estimators for acoustical background noise suppression based on maximum a posteriori estimation and super-Gaussian statistical modelling of the speech DFT amplitudes. The probability density function of the speech spectral amplitude is modelled with a simple parametric function, which allows a high approximation accuracy for Laplace- or Gamma-distributed real and imaginary parts of the speech DFT coefficients. Also, the statistical model can be adapted to optimally fit the distribution of the speech spectral amplitudes for a specific noise reduction system. Based on the superGaussian statistical model, computationally efficient maximum a posteriori speech estimators are derived, which outperform the commonly applied Ephraim-Malah algorithm. Keywords and phrases: speech enhancement, MAP estimation, speech model.
1.
INTRODUCTION
The reduction of acoustical background noise using a single microphone is an important subject to improve the quality of speech communication systems in the context of digital hearing aids, speech recognition, hands-free telephony, or teleconferencing. Although single-microphone speech enhancement has been a research topic for decades, the estimation of a clean speech signal from its noisy observation remains a challenging task, especially due to the wide variety of environmental noises. If the disturbing noise is assumed to be truly environmental, that is, its origin is, for example, machines, cars, or several persons talking at the same time, the specific properties of speech such as nonwhiteness, nonstationarity and nonGaussianity compared to unwanted noise allow a differentiation between speech and noise. Nonwhiteness means that the short-time spectrum of speech is generally less flat than that of acoustic noise. This This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
property can be exploited by separating speech and noise in the spectral domain. The concept of spectral domain noise attenuation has been introduced more than twenty years ago by Boll [1] as the subtraction of an estimated noise spectral magnitude from the noisy spectral magnitude. To estimate the noise power spectral density, the second property, nonstationarity, is exploited by averaging DFT squared magnit
Data Loading...