The Use of Adaptive Frame for Speech Recognition

PDF / 605,086 Bytes
7 Pages / 600 x 792 pts Page_size
101 Downloads / 197 Views

he Use of Adaptive Frame for Speech Recognition Sam Kwong Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong Email: [email protected]

Qianhua He Department of Electronic Engineering, South China University of Technology, China Email: [email protected] Received 19 January 2001 and in revised form 18 May 2001 We propose an adaptive frame speech analysis scheme through dividing speech signal into stationary and dynamic region. Long frame analysis is used for stationary speech, and short frame analysis for dynamic speech. For computation convenience, the feature vector of short frame is designed to be identical to that of long frame. Two expressions are derived to represent the feature vector of short frames. Word recognition experiments on the TIMIT and NON-TIMIT with discrete Hidden Markov Model (HMM) and continuous density HMM showed that steady performance improvement could be achieved for open set testing. On the TIMIT database, adaptive frame length approach (AFL) reduces the error reduction rates from 4.47% to 11.21% and 4.54% to 9.58% for DHMM and CHMM, respectively. In the NON-TIMIT database, AFL also can reduce the error reduction rates from 1.91% to 11.55% and 2.63% to 9.5% for discrete hidden Markov model (DHMM) and continuous HMM (CHMM), respectively. These results proved the effectiveness of our proposed adaptive frame length feature extraction scheme especially for the open testing. In fact, this is a practical measurement for evaluating the performance of a speech recognition system. Keywords and phrases: speech recognition, speech coding, adaptive frame, signal analysis.

1. INTRODUCTION To date, the most successful speech recognition systems mainly use Hidden Markov Model (HMM) for acoustic modeling. HMM in fact dominates the continuous speech recognition ﬁeld [1]. In order to improve the performance of speech recognition, a great deal of efforts had been made to study the training approaches for HMMs [2, 3, 4], or variations of the conventional HMM, such as the segment HMM [1], and the HMMs with state-conditioned secondorder nonstationary [5]. In general, frame-based feature analysis for speech signals has been accepted as a very successful technique. In this method, time speech samples are blocked into frames of N samples, with adjacent frames separated by M samples. Then the spectral characteristic coefﬁcients are calculated for each frame via some speech analysis methods (coding procedure), such as LPC, FFT analysis, Gabor expansion [6], or wavelets [7]. N is usually set to be the number of samples of 30–45 ms signal and M to be N/3, [8]. This procedure based on the assumption that speech signal could be considered as quasi-stationary if speech signal is examined over a sufﬁciently short period of time (between 5 and 100 ms). However, this is not true when the signal is

measured over long periods of time (on the order of 0.2 seconds or more). For reducing the discontinuities associated with windowing, pitch synchronously speech processing may be

Data Loading...

The Use of Adaptive Frame for Speech Recognition

Recommend Documents

Sparse Representations for Speech Recognition

Pattern Recognition for Speech Detection

Speech Recognition

Deep Learning for NLP and Speech Recognition

Automatic Speech Recognition of Galo

Multi-features Integration for Speech Emotion Recognition

Advanced Comb Filtering for Robust Speech Recognition

Novel Techniques for Dialectal Arabic Speech Recognition

Exploring Blockchain in Speech Recognition

Automatic speech recognition: a survey

Medical reporting using speech recognition

Design and Implementation of a SoPC System for Speech Recognition