Multichannel Direction-Independent Speech Enhancement Using Spectral Amplitude Estimation

PDF / 736,038 Bytes
10 Pages / 600 x 792 pts Page_size
48 Downloads / 234 Views

Multichannel Direction-Independent Speech Enhancement Using Spectral Amplitude Estimation Thomas Lotter Institute of Communication Systems and Data Processing, Aachen University (RWTH), Templergraben 55, D-52056 Aachen, Germany Email: [email protected]

Christian Benien Philips Research Center, Aachen, Weißhausstraße 2, D-52066 Aachen, Germany Email: [email protected]

Peter Vary Institute of Communication Systems and Data Processing, Aachen University (RWTH), Templergraben 55, D-52056 Aachen, Germany Email: [email protected] Received 25 November 2002 and in revised form 12 March 2003 This paper introduces two short-time spectral amplitude estimators for speech enhancement with multiple microphones. Based on joint Gaussian models of speech and noise Fourier coeﬃcients, the clean speech amplitudes are estimated with respect to the MMSE or the MAP criterion. The estimators outperform single microphone minimum mean square amplitude estimators when the speech components are highly correlated and the noise components are suﬃciently uncorrelated. Whereas the first MMSE estimator also requires knowledge of the direction of arrival, the second MAP estimator performs a direction-independent noise reduction. The estimators are generalizations of the well-known single channel MMSE estimator derived by Ephraim and Malah (1984) and the MAP estimator derived by Wolfe and Godsill (2001), respectively. Keywords and phrases: speech enhancement, microphone arrays, spectral amplitude estimation.

1.

INTRODUCTION

Speech communication appliances such as voice-controlled devices, hearing aids, and hands-free telephones often suffer from poor speech quality due to background noise and room reverberation. Multiple microphone techniques such as beamformers can improve the speech quality and intelligibility by exploiting the spatial diversity of speech and noise sources. Upon these techniques, one can diﬀerentiate between fixed and adaptive beamformers. A fixed beamformer combines the noisy signals by a time-invariant filter-and-sum operation. The filters can be designed to achieve constructive superposition towards a desired direction (delay-and-sum beamformer) or in order to maximize the SNR improvement (superdirective beamformer) [1, 2, 3]. Adaptive beamformers commonly consist of a fixed beamformer towards a fixed desired direction and an adaptive null steering towards moving interfering sources [4, 5].

All beamformer techniques assume the target direction of arrival (DOA) to be known a priori or assume that it can be estimated suﬃciently enough. Usually the performance of such a beamforming system decreases dramatically if the DOA knowledge is erroneous. To estimate the DOA during runtime, time diﬀerence of arrival (TDOA)-based locators evaluate the maximum of a weighted cross correlation [6, 7]. Subspace methods have the ability to detect multiple sources by decomposing the spatial covariance matrix into a signal and a noise subspace. However, the performance of all DOA estimation algorithms suﬀers severely f

Data Loading...

Multichannel Direction-Independent Speech Enhancement Using Spectral Amplitude Estimation

Recommend Documents

Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

Postfiltering Using Multichannel Spectral Estimation in Multispeaker Environments

Vision-Referential Speech Enhancement with Binary Mask and Spectral Subtraction

Power Spectral Density Error Analysis of Spectral Subtraction Type of Speech Enhancement Methods

Speech Spectral Envelope

Time-Varying Noise Estimation for Speech Enhancement and Recognition Using Sequential Monte Carlo Method

Fundamentals of Speech Enhancement

Speech Enhancement via EMD

Speech Enhancement by Multichannel Crosstalk Resistant ANC and Improved Spectrum Subtraction

Text-independent speaker recognition using LSTM-RNN and speech enhancement

Speech Enhancement in the STFT Domain

Fractional Fourier Transform Techniques for Speech Enhancement