An Acoustic Human-Machine Front-End for Multimedia Applications

  • PDF / 985,616 Bytes
  • 11 Pages / 600 x 792 pts Page_size
  • 32 Downloads / 168 Views

DOWNLOAD

REPORT


An Acoustic Human-Machine Front-End for Multimedia Applications Wolfgang Herbordt Telecommunications Laboratory, University Erlangen-Nuremberg, Cauerstraße 7, 91058 Erlangen, Germany Email: [email protected]

Herbert Buchner Telecommunications Laboratory, University Erlangen-Nuremberg, Cauerstraße 7, 91058 Erlangen, Germany Email: [email protected]

Walter Kellermann Telecommunications Laboratory, University Erlangen-Nuremberg, Cauerstraße 7, 91058 Erlangen, Germany Email: [email protected] Received 31 May 2002 and in revised form 24 September 2002 A concept of robust adaptive beamforming integrating stereophonic acoustic echo cancellation is presented which reconciles the need for low-computational complexity and efficient adaptive filtering with versatility and robustness in real-world scenarios. The synergetic combination of a robust generalized sidelobe canceller and a stereo acoustic echo canceller is designed in the frequency domain based on a general framework for multichannel adaptive filtering in the frequency domain. Theoretical analysis and real-time experiments show the superiority of this concept over comparable time-domain approaches in terms of computational complexity and adaptation behaviour. The real-time implementation confirms that the concept is robust and meets well the practical requirements of real-world scenarios, which makes it a promising candidate for commercial products. Keywords and phrases: hands-free acoustic human-machine front-end, microphone arrays, robust adaptive beamforming, stereophonic acoustic echo cancellation, generalized sidelobe canceller, frequency-domain adaptive filters.

1.

INTRODUCTION

With a continuously increasing desire for convenient human-machine interaction, the acoustic interface of any terminal for multimedia or telecommunication services is challenged to allow seamless, hands-free, and untethered audio communication for the benefit of human users. Audio capture is usually responsible for extracting desired signals for the multimedia device or, in telecommunication applications, for remote listeners. Compared to sound capture by a microphone next to the source, seamless audio interfaces as depicted in Figure 1 cause the desired signals to be impaired by (a) acoustic echoes from the loudspeaker(s), (b) local interferers, and (c) reverberation due to distant talking. Techniques for acoustic echo cancellation (AEC) evolved over the last two decades [1, 2] and lead to the recent presentation of a five-channel AEC for real-time operation on a personal computer (PC) [3, 4]. If no distortion of the

desired signal should be allowed, suppression of local interference is best handled by microphone arrays [5, 6]. Here, robust adaptive beamforming algorithms are necessary to cope with time-varying acoustic environments including moving desired sources. Removing reverberation from the desired signal, ideally, requires blind identification and inversion of the channel(s) from the source to the sensor(s). For realistic time-varying environments, this problem still awaits theoretical sol