Feature Extraction of the Speech Signal

Isolated speech recognition, speaker recognition, and continuous speech recognition require the feature vector extracted from the speech signal. This is subjected to pattern recognition to formulate the classifier. The feature vector is extracted from eac

PDF / 3,394,826 Bytes
42 Pages / 439.37 x 666.142 pts Page_size
42 Downloads / 356 Views

DOWNLOAD

REPORT

Feature Extraction of the Speech Signal

Abstract Isolated speech recognition, speaker recognition, and continuous speech recognition require the feature vector extracted from the speech signal. This is subjected to pattern recognition to formulate the classifier. The feature vector is extracted from each frame of the speech signal under test. In this chapter, various parameter extraction techniques such as linear predictive co-efficients as the filter co-efficients of the vocal tract model, poles of the vocal tract filter, cepstrual co-efficients, melfrequency cepstral co-efficients (MFCC), line spectral co-efficients, and reflection co-efficients are discussed in this chapter. The preprocessing techniques such as dynamic time warping, endpoint detection, and pre-emphasis are also discussed in this chapter.

3.1 Endpoint Detection The isolated speech signal recorded through the microphone will have noise at both ends of the speech segment. There is the need to identify the beginning and the ending of the speech segment from the recorded speech signal. This is known as endpoint detection. This is identified as follows. 1. The speech signal S is divided into frames. Compute the sum-squared values of the individual frames. The energy of the frame consists of voiced speech signal (due to vibration of the vocal chords) is usually greater than the noise signal. Identify the first frame of the speech segment that has the energy greater than the predefined upper threshold value. From this point onwards, search the frame in the backward direction such that the energy of frame exceeds the predefined lower threshold value. Let the identified first frame of the voiced speech segment is represented as V . 2. Let S(n) be the nth sample of the speech signal. If sgn(S(n))sgn(S(n + 1)) is negative, zero crossing has happened at the position nth sample of the speech signal. The zero-crossing rate of the unvoiced segment near to the voiced segment E. S. Gopi, Digital Speech Processing Using Matlab, Signals and Communication Technology, DOI: 10.1007/978-81-322-1677-3_3, © Springer India 2014

93

94

3 Feature Extraction of the Speech Signal

is larger when compared to the noise. The number of zero crossings per frame is known as zero-crossing rate. Once the first frame of the voiced speech segment (V ) is identified using the energy computation, the first frame of the unvoiced speech segment (if available) available prior to V is identified as follows. From V , search the previous 25 frames backwards to choose the first frame that has the zero-crossing rate lesser than the predefined threshold value and it is declared as the first unvoiced speech frame. 3. The above procedure is repeated from the last sample of the speech segment to identify the endpoint of the speech segment. %endpointdetection.m function [res1,res2,speechsegment,utforste, ltforste,ltforzcr]... =endpointdetection(S,FS) %mzcr, mste-mean of the zero-crossing rate and the short-time energy for the first 100\,ms %vzcr,vste-variance of the zero-crossing rate and the sh

Data Loading...

Feature Extraction of the Speech Signal

Recommend Documents

Multidimensional feature diversity based speech signal acquisition

EEG Signal Processing and Feature Extraction

Application of Feature Extraction in Text-to-Speech Processing

Active Shape Models for Visual Speech Feature Extraction

Feature Extraction

Convergence Feature Extraction

Linear Feature Extraction

Audio Feature Extraction

Shape Feature Extraction

Identity Level in the Speech Signal

Feature Extraction, Abstract

Force Field Feature Extraction