A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

  • PDF / 848,040 Bytes
  • 11 Pages / 600.03 x 792 pts Page_size
  • 29 Downloads / 169 Views

DOWNLOAD

REPORT


Research Article A Robust Statistical-Based Speaker’s Location Detection Algorithm in a Vehicular Environment Jwu-Sheng Hu, Chieh-Cheng Cheng, and Wei-Han Liu Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu 300, Taiwan Received 1 May 2006; Revised 27 July 2006; Accepted 26 August 2006 Recommended by Aki Harma This work presents a robust speaker’s location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones’ mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment. Copyright © 2007 Jwu-Sheng Hu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

Electronic systems, such as mobile phones, global positioning systems (GPS), CD or VCD players, air conditioners, and so forth, are becoming increasingly popular in vehicles. Intelligent hands-free interfaces, including human-computer interaction (HCI) interfaces [1–3] with speech recognition, have recently been proposed due to concerns over driving safety and convenience. Speech recognition suffers from environmental noises, explaining why speech enhancement approaches using multiple microphones [4–7] have been introduced to purify speech signals in noisy environments. For example, in vehicle applications, a driver may wish to exert a particular authority in manipulating the in-car electronic systems. Additionally, for speech signal purification, a better receiving beam using a microphone array can be formed to suppress the environmental noises if the speaker’s location is known. The concept of employing a microphone array to localize sound source has been developed over 30 years [8–15]. However, most methods do not yield satisfactory results in highly reverberating, scattering or noisy environments, such as the phase correlation methods shown in [16]. Consequently, Brandstein and Silverman proposed Tukey’s Biweight to the

weighting function to overcome the reflection effect [17]. Additionally, histogram-based