Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments

  • PDF / 1,773,380 Bytes
  • 15 Pages / 600 x 792 pts Page_size
  • 108 Downloads / 263 Views

DOWNLOAD

REPORT


Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments Simon Doclo Department of Electrical Engineering, Katholieke Universiteit Leuven, ESAT-SISTA, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium Email: [email protected]

Marc Moonen Department of Electrical Engineering, Katholieke Universiteit Leuven, ESAT-SISTA, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium Email: [email protected] Received 23 September 2002 and in revised form 2 June 2003 Two adaptive algorithms are presented for robust time delay estimation (TDE) in acoustic environments with a large amount of background noise and reverberation. Recently, an adaptive eigenvalue decomposition (EVD) algorithm has been developed for TDE in highly reverberant acoustic environments. In this paper, we extend the adaptive EVD algorithm to noisy and reverberant acoustic environments, by deriving an adaptive stochastic gradient algorithm for the generalized eigenvalue decomposition (GEVD) or by prewhitening the noisy microphone signals. We have performed simulations using a localized and a diffuse noise source for several SNRs, showing that the time delays can be estimated more accurately using the adaptive GEVD algorithm than using the adaptive EVD algorithm. In addition, we have analyzed the sensitivity of the adaptive GEVD algorithm with respect to the accuracy of the noise correlation matrix estimate, showing that its performance may be quite sensitive, especially for low SNR scenarios. Keywords and phrases: time delay estimation, acoustic source localization, generalized eigenvalue decomposition, stochastic gradient.

1. INTRODUCTION In many speech communication applications, such as teleconferencing, hand-free voice-controlled systems, and hearing aids, it is desirable to localize the dominant speaker. By using a microphone array, it is possible to determine the position of this speaker such that the microphone array can be electronically steered using a fixed (or adaptive) beamformer in order to provide spatially selective speech acquisition [1, 2]. In multimedia teleconferencing systems, the position of the speaker can be used not only for microphone array beamforming, but also for automatic video camera steering [3, 4] and for determining binaural cues for stereo imaging. It has been shown that it is possible to calculate the position of a speaker from the time delays between the different microphone signals, for example, using maximum likelihood or least-squares methods [5, 6]. However, accurate estimation of the time delays between the different microphone signals is not an easy task because of the room reverberation, the

acoustic background noise, and the nonstationary character of the speech signal. Generally, room reverberation is considered to be the main problem for time delay estimation (TDE) [7], but acoustic background noise can also considerably decrease the performance of TDE algorithms. Whereas highly noisy situations are not very common in typical teleconferencin