Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

  • PDF / 670,446 Bytes
  • 7 Pages / 600 x 792 pts Page_size
  • 71 Downloads / 198 Views

DOWNLOAD

REPORT


Using Mel-Frequency Cepstral Coefficients in Missing Data Technique Zhang Jun Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China School of Electronic and Communication Engineering, South China University of Technology, Guangzhou 510640, China Email: zhj [email protected]

Sam Kwong Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China Email: [email protected]

Wei Gang School of Electronic and Communication Engineering, South China University of Technology, Guangzhou 510640, China Email: [email protected]

Qingyang Hong Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China Email: [email protected] Received 19 February 2003; Revised 16 June 2003; Recommended for Publication by Mukund Padmanabhan Filter bank is the most common feature being employed in the research of the marginalisation approaches for robust speech recognition due to its simplicity in detecting the unreliable data in the frequency domain. In this paper, we propose a hybrid approach based on the marginalisation and the soft decision techniques that make use of the Mel-frequency cepstral coefficients (MFCCs) instead of filter bank coefficients. A new technique for estimating the reliability of each cepstral component is also presented. Experimental results show the effectiveness of the proposed approaches. Keywords and phrases: MFCC, missing data techniques, robust speech recognition.

1.

INTRODUCTION

In spite of many years of efforts, the robustness of speech recognition in the noisy environment is still a fundamental unsolved issue in today’s automatic speech recognition (ASR) systems. Recently, missing data theory [1, 2, 3, 4] is proposed as an operationalization to improve the robustness of the ASR decoding process. Experimental results show that it can significantly restore the ASR performance with little prior assumptions made about the characteristics of the environment noises. However, most of the previous marginalisation approaches are only derived and tested for the filter bank features due to the convenience of detecting the unreliable data in the frequency domain. Most often, cepstral features are the parameterisation of choice for many speech recognition applications. For example, the Mel-frequency cepstral coefficient (MFCC) [5] representation of speech is probably the most commonly used representation in speech recog-

nition and recently being standardized for the distributed speech recognition (DSR) [6]. Generally, cepstral features are more compactible, discriminable, and most importantly, nearly decorrelated such that they allow the diagonal covariance to be used by the hidden Markov models (HMMs) effectively. Therefore, they can usually provide higher baseline performance over filter bank features. Applying missing data techniques to cepstral features is obviously attractive and natural. Unfortunately, while decorrelating, the cepstral transform also smears localized spectral uncertainty over global cepstral uncertainty. This de