Microphone Array Speaker Localizers Using Spatial-Temporal Information

  • PDF / 882,388 Bytes
  • 17 Pages / 600.03 x 792 pts Page_size
  • 42 Downloads / 213 Views

DOWNLOAD

REPORT


Microphone Array Speaker Localizers Using Spatial-Temporal Information Sharon Gannot1 and Tsvi Gregory Dvorkind2 1 School

of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel of Electrical Engineering, Technion – Israel Institute of Technology, Technion City, Haifa 32000, Israel

2 Department

Received 20 January 2005; Revised 17 May 2005; Accepted 22 August 2005 A dual-step approach for speaker localization based on a microphone array is addressed in this paper. In the first stage, which is not the main concern of this paper, the time difference between arrivals of the speech signal at each pair of microphones is estimated. These readings are combined in the second stage to obtain the source location. In this paper, we focus on the second stage of the localization task. In this contribution, we propose to exploit the speaker’s smooth trajectory for improving the current position estimate. Three localization schemes, which use the temporal information, are presented. The first is a recursive form of the Gauss method. The other two are extensions of the Kalman filter to the nonlinear problem at hand, namely, the extended Kalman filter and the unscented Kalman filter. These methods are compared with other algorithms, which do not make use of the temporal information. An extensive experimental study demonstrates the advantage of using the spatial-temporal methods. To gain some insight on the obtainable performance of the localization algorithm, an approximate analytical evaluation, verified by an experimental study, is conducted. This study shows that in common TDOA-based localization scenarios—where the microphone array has small interelement spread relative to the source position—the elevation and azimuth angles can be accurately estimated, whereas the Cartesian coordinates as well as the range are poorly estimated. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION AND PROBLEM FORMULATION

Determining the spatial position of a speaker finds a growing interest in video conference scenarios where automated camera steering and tracking are required. Acoustic source localization might also be used as a preprocessor stage for speech enhancement algorithms, which are based on microphone array beamformers. Usually, methods for speaker localization are comprised of two stages. In the first stage, which is not the main concern of this paper, microphone array is used for extracting the time difference between arrivals of the speech signal at each pair of microphones. These readings are then processed by the second stage to obtain the source position. This paper focus is on the second algorithmic stage of the two-step approaches. In the first algorithmic stage, the time difference of arrival (TDOA) is estimated using spatially separated microphone pairs. The classical method for performing this task is the generalized cross-correlation (GCC) algorithm [1]. Many improvements of this method for the reverberant case exist. Brandstein and Silverman used a robust estimate of the crosspower