Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array

  • PDF / 1,498,266 Bytes
  • 13 Pages / 600.03 x 792 pts Page_size
  • 14 Downloads / 225 Views

DOWNLOAD

REPORT


Research Article Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array Shigeki Miyabe,1 Yoichi Hinamoto,2 Hiroshi Saruwatari,1 Kiyohiro Shikano,1 and Yosuke Tatekura3 1 Graduate

School of Information Science, Nara Institute of Science and Technology, Takayama-Cho 8916-5, Ikoma-Shi, Nara 630-0192, Japan 2 Department of Control Engineering, Takuma National College of Technology, Takuma-Cho Koda 551, Mitoyo-Shi, Kagawa 769-1192, Japan 3 Faculty of Engineering, Shizuoka University, Johoku 3-5-1, Hamamatsu-Shi, Shizuoka 432-8561, Japan Received 1 May 2006; Revised 17 October 2006; Accepted 29 October 2006 Recommended by Aki Harma A barge-in free spoken dialogue interface using sound field control and microphone array is proposed. In the conventional spoken dialogue system using an acoustic echo canceller, it is indispensable to estimate a room transfer function, especially when the transfer function is changed by various interferences. However, the estimation is difficult when the user and the system speak simultaneously. To resolve the problem, we propose a sound field control technique to prevent the response sound from being observed. Combined with a microphone array, the proposed method can achieve high elimination performance with no adaptive process. The efficacy of the proposed interface is ascertained in the experiments on the basis of sound elimination and speech recognition. Copyright © 2007 Shigeki Miyabe et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

For hands-free realization of smooth communication with a spoken dialogue system, it should be guaranteed that a user’s command utterance reaches the system clearly. However, a user might interrupt sound responses from the system and utter a command, or he might start speaking before the termination of the sound responses from the system. In such a situation, the sound given from the system to the user is observed as an acoustic echo return at a microphone used for acquisition of the user’s speech input, and degrades the speech recognition performance in receiving the user’s input command. Such a situation is referred to as barge-in [1]. Hereafter, the sound message outputted from the system is called response sound. As a solution to this problem, an acoustic echo canceller is commonly used [2]. Since the echo return of the response sound is a convolution of the known response sound signal and a transfer function from a loudspeaker to a microphone, we eliminate the echo return by estimating the transfer function with an adaptive filter. Many types of acoustic echo canceller have been proposed, such as single-channel, stereophonic, beamformer-integrated, and

wave-synthesis-integrated types [3–6]. The room transfer function is variable and fluctuates because of changes of room conditions, such as the movement of people in th