Frequency-Domain Blind Source Separation of Many Speech Signals Using Near-Field and Far-Field Models
- PDF / 1,925,681 Bytes
- 13 Pages / 600.03 x 792 pts Page_size
- 100 Downloads / 235 Views
Frequency-Domain Blind Source Separation of Many Speech Signals Using Near-Field and Far-Field Models Ryo Mukai, Hiroshi Sawada, Shoko Araki, and Shoji Makino NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-Cho, Soraku-Gun, Kyoto 619-0237, Japan Received 19 December 2005; Revised 26 April 2006; Accepted 11 June 2006 We discuss the frequency-domain blind source separation (BSS) of convolutive mixtures when the number of source signals is large, and the potential source locations are omnidirectional. The most critical problem related to the frequency-domain BSS is the permutation problem, and geometric information is helpful as regards solving it. In this paper, we propose a method for obtaining proper geometric information with which to solve the permutation problem when the number of source signals is large and some of the signals come from the same or a similar direction. First, we describe a method for estimating the absolute DOA by using relative DOAs obtained by the solution provided by independent component analysis (ICA) and the far-field model. Next, we propose a method for estimating the spheres on which source signals exist by using ICA solution and the near-field model. We also address another problem with regard to frequency-domain BSS that arises from the circularity of discrete-frequency representation. We discuss the characteristics of the problem and present a solution for solving it. Experimental results using eight microphones in a room show that the proposed method can separate a mixture of six speech signals arriving from various directions, even when two of them come from the same direction. Copyright © 2006 Ryo Mukai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1.
INTRODUCTION
Blind source separation (BSS) [1, 2] is a technique for estimating original source signals using only observed mixtures. The BSS of audio signals has a wide range of applications including speech enhancement [3] for speech recognition, hands-free telecommunication systems, and highquality hearing aids. Independent component analysis (ICA) [4–7] is one of the main statistical methods used for BSS. It is theoretically possible to solve the BSS problem with a large number of sources by ICA, if we assume that the number of sensors is equal to or greater than the number of source signals. However, there are many practical difficulties. In most realistic audio applications, the signals are mixed in a convolutive manner with reverberations, and the separation system that we have to estimate is a matrix of filters, not just a matrix of scalars. Although many studies have been undertaken on BSS in a reverberant environment [8], most of them have assumed two source signals arriving from different directions, and only a few studies have dealt with more than two source signals. There are two major approaches to solving the convolut
Data Loading...