Localization of Directional Sound Sources Supported by A Priori Information of the Acoustic Environment

  • PDF / 1,717,569 Bytes
  • 14 Pages / 600.05 x 792 pts Page_size
  • 73 Downloads / 180 Views

DOWNLOAD

REPORT


Research Article Localization of Directional Sound Sources Supported by A Priori Information of the Acoustic Environment ´ Fodroczi ´ 1 and Andras ´ Radvanyi ´ 2 Zoltan 1 Faculty

of Information Technology, P´azm´any P´eter Catholic University, Pr´ater u. 50/A, 1058 Budapest, Hungary and Neural Computing Laboratory, Computer and Automation Research Institute, Hungarian Academy of Sciences, Lagymanyosi u. 11, 1111 Budapest, Hungary

2 Analogic

´ Correspondence should be addressed to Zolt´an Fodroczi, [email protected] Received 6 November 2006; Revised 6 March 2007; Accepted 11 July 2007 Recommended by Douglas B. Williams Speaker localization with microphone arrays has received significant attention in the past decade as a means for automated speaker tracking of individuals in a closed space for videoconferencing systems, directed speech capture systems, and surveillance systems. Traditional techniques are based on estimating the relative time difference of arrivals (TDOA) between different channels, by utilizing crosscorrelation function. As we show in the context of speaker localization, these estimates yield poor results, due to the joint effect of reverberation and the directivity of sound sources. In this paper, we present a novel method that utilizes a priori acoustic information of the monitored region, which makes it possible to localize directional sound sources by taking the effect of reverberation into account. The proposed method shows significant improvement of performance compared with traditional methods in “noise-free” condition. Further work is required to extend its capabilities to noisy environments. ´ and A. Radv´anyi. This is an open access article distributed under the Creative Commons Copyright © 2008 Z. Fodroczi Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The inverse problem of localizing a source by using signal measurements at an array of sensors is a classical problem in signal processing, with applications in sonar, radar, and acoustic engineering. In this paper, we focus on a subset of these efforts, where the speaker is to be localized in a conference environment. Brandstein’s book [1] provides a comprehensive introduction to the state-of-the-art methods in this field. Generally, three classes of source localization algorithms are taken into account: (i) high-resolution spectral estimation [2, 3], (ii) steered beamformer energy response [4, 5], and (iii) estimation of time difference of arrivals (TDOA) [6–10]. Some algorithms combine features from more than one class such as the accumulated correlation method [11] which has shown [12] how to combine the accuracy of beamforming and the computational efficiency of TDOA-based techniques [6–10]. In 1976, Knapp and Carter [13] proposed the generalized cross-correlation (GCC) method that was the most popular technique for TDOA estimation. Since then, many new ideas have been proposed to deal more effectively with noise

an