DWT and LPC based feature extraction methods for isolated word recognition
- PDF / 217,882 Bytes
- 7 Pages / 595.28 x 793.7 pts Page_size
- 55 Downloads / 133 Views
RESEARCH
Open Access
DWT and LPC based feature extraction methods for isolated word recognition Navnath S Nehe1* and Raghunath S Holambe2
Abstract In this article, new feature extraction methods, which utilize wavelet decomposition and reduced order linear predictive coding (LPC) coefficients, have been proposed for speech recognition. The coefficients have been derived from the speech frames decomposed using discrete wavelet transform. LPC coefficients derived from subband decomposition (abbreviated as WLPC) of speech frame provide better representation than modeling the frame directly. The WLPC coefficients have been further normalized in cepstrum domain to get new set of features denoted as wavelet subband cepstral mean normalized features. The proposed approaches provide effective (better recognition rate), efficient (reduced feature vector dimension), and noise robust features. The performance of these techniques have been evaluated on the TI-46 isolated word database and own created Marathi digits database in a white noise environment using the continuous density hidden Markov model. The experimental results also show the superiority of the proposed techniques over the conventional methods like linear predictive cepstral coefficients, Mel-frequency cepstral coefficients, spectral subtraction, and cepstral mean normalization in presence of additive white Gaussian noise. Keywords: feature extraction, linear predictive coding, discrete wavelet transform, cepstral mean normalization, hidden Markov model
1. Introduction A speech recognition system has two major components, namely, feature extraction and classification. Feature extraction method plays a vital role in speech recognition task. There are two dominant approaches of acoustic measurement. First is a temporal domain or parametric approach such as linear prediction [1], which is developed to closely match the resonant structure of human vocal tract that produces the corresponding sound. Linear prediction coefficients (LPC) technique is not suitable for representing speech because it assumes signal stationary within a given frame and hence not analyze the localized events accurately. Also it is not able to capture the unvoiced and nasalized sounds properly [2]. Second approach is nonparametric frequency domain approach based on human auditory perception system and known as Mel-frequency cepstral coefficients (MFCC) [3]. The widespread use of the MFCCs is due * Correspondence: [email protected] 1 Department of Instrumentation Engineering, Pravara Rural Engineering College, Loni 413736, Maharashtra, India Full list of author information is available at the end of the article
to its low computational complexity and better performance for ASR under clean matched conditions. Performance of MFCC degrades rapidly in presence of noise and degradation is directly proportional to signal-tonoise ratio (SNR). Poor performance of LPC and its different forms like reflection coefficients, linear prediction cepstral coefficients (LPCC) as well as MFCC and its various f
Data Loading...