Using Intermicrophone Correlation to Detect Speech in Spatially Separated Noise

PDF / 1,129,277 Bytes
14 Pages / 600.03 x 792 pts Page_size
90 Downloads / 228 Views

Using Intermicrophone Correlation to Detect Speech in Spatially Separated Noise Ashish Koul1 and Julie E. Greenberg2 1 Broadband

Video Compression Group, Broadcom Corporation, Andover, MA 01810, USA Institute of Technology, 77 Massachusetts Avenue, Room E25-518, Cambridge, MA 02139-4307, USA

2 Massachusetts

Received 29 April 2004; Revised 20 April 2005; Accepted 25 April 2005 This paper describes a system for determining intervals of “high” and “low” signal-to-noise ratios when the desired signal and interfering noise arise from distinct spatial regions. The correlation coeﬃcient between two microphone signals serves as the decision variable in a hypothesis test. The system has three parameters: center frequency and bandwidth of the bandpass filter that prefilters the microphone signals, and threshold for the decision variable. Conditional probability density functions of the intermicrophone correlation coeﬃcient are derived for a simple signal scenario. This theoretical analysis provides insight into optimal selection of system parameters. Results of simulations using white Gaussian noise sources are in close agreement with the theoretical results. Results of more realistic simulations using speech sources follow the same general trends and illustrate the performance achievable in practical situations. The system is suitable for use with two microphones in mild-to-moderate reverberation as a component of noise-reduction algorithms that require detecting intervals when a desired signal is weak or absent. Copyright © 2006 A. Koul and J. E. Greenberg. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

Conventional hearing aids do not selectively attenuate background noise, and their inability to do so is a common complaint of hearing-aid users [1–4]. Researchers have proposed a variety of speech-enhancement and noise-reduction algorithms to address this problem. Many of these algorithms require identification of intervals when the desired speech signal is weak or absent, so that particular noise characteristics can be estimated accurately [5–7]. Systems that perform this function are referred to by a number of terms, including voice activity detectors, speech detectors, pause detectors, and double-talk detectors. Speech pause detectors are not limited to use in hearing-aid algorithms. They are used in a number of applications including speech recognition [8, 9], mobile telecommunications [10, 11], echo cancellation [12], and speech coding [13]. In some cases, noise-reduction algorithms are initially developed and evaluated using information about the timing of speech pauses derived from the clean signal, which is possible in computer simulations but not in a practical device. Marzinzik and Kollmeier [11] point out that speech pause detectors “are a very sensitive and often limiting part of systems for the reduction of additive noise in

Data Loading...

Using Intermicrophone Correlation to Detect Speech in Spatially Separated Noise

Recommend Documents

Noise Reduction in Speech Processing

A Machine Learning Model to Detect Speech and Reading Pathologies

Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion

Canonical Correlation Analysis in Speech Enhancement

Research on Speech Changes Due to Environmental Noise

Spatially controlled CdSe nanocrystal distribution in phase separated polymer blend films

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Using Link Clustering to Detect Influential Spreaders

Using machine learning to detect misstatements

MP3 Compression to Diminish Adversarial Noise in End-to-End Speech Recognition

Speech-to-Speech Translation

Noise effect on Amazigh digits in speech recognition system