Speech Source Separation in Convolutive Environments Using Space-Time-Frequency Analysis
- PDF / 761,102 Bytes
- 11 Pages / 600.03 x 792 pts Page_size
- 35 Downloads / 195 Views
Speech Source Separation in Convolutive Environments Using Space-Time-Frequency Analysis Shlomo Dubnov,1 Joseph Tabrikian,2 and Miki Arnon-Targan2 1 CALIT
2, University of California, San Diego, CA 92093, USA of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
2 Department
Received 10 February 2005; Revised 28 September 2005; Accepted 4 October 2005 We propose a new method for speech source separation that is based on directionally-disjoint estimation of the transfer functions between microphones and sources at different frequencies and at multiple times. The spatial transfer functions are estimated from eigenvectors of the microphones’ correlation matrix. Smoothing and association of transfer function parameters across different frequencies are performed by simultaneous extended Kalman filtering of the amplitude and phase estimates. This approach allows transfer function estimation even if the number of sources is greater than the number of microphones, and it can operate for both wideband and narrowband sources. The performance of the proposed method was studied via simulations and the results show good performance. Copyright © 2006 Shlomo Dubnov et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1.
INTRODUCTION
Many audio communication and entertainment applications deal with acoustic signals that contain combinations of several acoustic sources in a mixture that overlaps in time and frequency. In the recent years, there has been a growing interest in methods that are capable of separating audio signals from microphone arrays using blind source separation (BSS) techniques [1]. In contrast to most of the research works in BSS that assume multiple microphones, the audio data in most practical situations is limited to stereo recordings. Moreover, the majority of the potential applications of BSS in the audio realm consider separation of simultaneous audio sources in reverberant or echo environments, such as a room or inside a vehicle. These applications deal with convolutive mixtures [2] that often contain long impulse responses that are difficult to estimate or invert. In this paper, we consider a simpler but still practical and largely overlooked situation of mixtures that contain a combination of source signals in weak reverberation environments, such as speech or music recorded with close microphones. The main mixing effect in such a case is direct path delay and possibly a small combination of multipath delays that can be described by convolution with a relatively short impulse response. Recently, several works proposed separation of multiple signals when additional assumptions
are imposed on the signals in the time-frequency (TF) domain. In [3, 4] an assumption that each source occupies separate regions in short-time Fourier transform (STFT) representation using an analysis window W(t) (so-cal
Data Loading...