Channel Effect Compensation in LSF Domain

  • PDF / 777,357 Bytes
  • 8 Pages / 600 x 792 pts Page_size
  • 62 Downloads / 193 Views

DOWNLOAD

REPORT


Channel Effect Compensation in LSF Domain An-Tze Yu Department of Computer Science, National Chubei Senior High School, Chubei, Hsinchu, Taiwan 302, Taiwan Email: [email protected]

Hsiao-Chuan Wang Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 300, Taiwan Email: [email protected] Received 15 April 2003 and in revised form 9 May 2003 This study addresses the problem of channel effect in the line spectrum frequency (LSF) domain. LSF parameters are the popular speech features encoded in the bit stream for low bit-rate speech transmission. A method of channel effect compensation in LSF domain is of interest for robust speech recognition on mobile communication and Internet systems. If the bit error rate in the transmission of digital encoded speech is negligibly low, the channel distortion comes mainly from the microphone or the handset. When the speech signal is represented in terms of the phase of inverse filter derived from LP analysis, this channel distortion can be expressed in terms of the channel phase. Further derivation shows that the mean subtraction performed on the phase of inverse filter can minimize the channel effect. Based on this finding, an iterative algorithm is proposed to remove the bias on LSFs due to channel effect. The experiments on the simulated channel distorted speech and the real telephone speech are conducted to show the effectiveness of our proposed method. The performance of the proposed method is comparable to that of cepstral mean normalization (CMN) in using cepstral coefficients. Keywords and phrases: line spectrum frequency, channel distortion, channel effect compensation, robust speech recognition.

1.

INTRODUCTION

Channel distortion is always a serious problem in speech recognition systems. Channel distortion may drastically degrade the performance of speech recognition [1, 2, 3]. The channel effect in the cepstral domain has been extensively studied. Many approaches have been proposed for eliminating the influence of channel distortion to speech recognition performance [4, 5, 6, 7, 8, 9]. However, few studies aim at the channel effect in the line spectrum frequency (LSF) domain. LSFs are usually the parameters used for low bit-rate speech transmission (e.g., ITU-T G.723.1, G.728, G.729, TIA IS-96, IS-127, . . .). A speech or speaker recognition algorithm based on LSFs is of interest in mobile communication and Internet systems [10, 11, 12, 13, 14]. Although the LSF parameters show the poor performance in a large vocabulary continuous speech recognition (LVCSR) system, they can obtain comparable performance as cepstral coefficients do in connected digits recognition or small vocabulary speech recognition systems [12, 13]. Since the LSF parameters can be extracted directly from the bit stream of encoded speech, they are the very promising features for speech recognition in some simple applications.

The effect of codec process is another factor to influence the speech quality [15]. Since the encoded speech parameters are the only available informa