Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

PDF / 670,446 Bytes
7 Pages / 600 x 792 pts Page_size
71 Downloads / 234 Views

Using Mel-Frequency Cepstral Coefficients in Missing Data Technique Zhang Jun Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China School of Electronic and Communication Engineering, South China University of Technology, Guangzhou 510640, China Email: zhj [email protected]

Sam Kwong Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China Email: [email protected]

Wei Gang School of Electronic and Communication Engineering, South China University of Technology, Guangzhou 510640, China Email: [email protected]

Qingyang Hong Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China Email: [email protected] Received 19 February 2003; Revised 16 June 2003; Recommended for Publication by Mukund Padmanabhan Filter bank is the most common feature being employed in the research of the marginalisation approaches for robust speech recognition due to its simplicity in detecting the unreliable data in the frequency domain. In this paper, we propose a hybrid approach based on the marginalisation and the soft decision techniques that make use of the Mel-frequency cepstral coeﬃcients (MFCCs) instead of filter bank coeﬃcients. A new technique for estimating the reliability of each cepstral component is also presented. Experimental results show the eﬀectiveness of the proposed approaches. Keywords and phrases: MFCC, missing data techniques, robust speech recognition.

1.

INTRODUCTION

In spite of many years of eﬀorts, the robustness of speech recognition in the noisy environment is still a fundamental unsolved issue in today’s automatic speech recognition (ASR) systems. Recently, missing data theory [1, 2, 3, 4] is proposed as an operationalization to improve the robustness of the ASR decoding process. Experimental results show that it can significantly restore the ASR performance with little prior assumptions made about the characteristics of the environment noises. However, most of the previous marginalisation approaches are only derived and tested for the filter bank features due to the convenience of detecting the unreliable data in the frequency domain. Most often, cepstral features are the parameterisation of choice for many speech recognition applications. For example, the Mel-frequency cepstral coeﬃcient (MFCC) [5] representation of speech is probably the most commonly used representation in speech recog-

nition and recently being standardized for the distributed speech recognition (DSR) [6]. Generally, cepstral features are more compactible, discriminable, and most importantly, nearly decorrelated such that they allow the diagonal covariance to be used by the hidden Markov models (HMMs) effectively. Therefore, they can usually provide higher baseline performance over filter bank features. Applying missing data techniques to cepstral features is obviously attractive and natural. Unfortunately, while decorrelating, the cepstral transform also smears localized spectral uncertainty over global cepstral uncertainty. This de

Data Loading...

Using Mel-Frequency Cepstral Coefficients in Missing Data Technique

Recommend Documents

Automatic Recognition of Bird Species Using Human Factor Cepstral Coefficients

Recognition of emotion from speech using evolutionary cepstral coefficients

SICE: an improved missing data imputation technique

Missing Plot Technique

Missing Data

Missing Data

Identification of Drone Payload Using Mel-Frequency Cepstral Coefficients and LSTM Neural Networks

Imputing Block of Missing Data Using Deep Autoencoder

Missing Data Analysis and Design

Iterative Imputation of Missing Data Using Auto-Encoder Dynamics

Product failure prediction with missing data using graph neural networks

Handling Missing Data in Ranked Set Sampling