Pattern Recognition for Speech Detection
The supervised pattern recognition techniques such as back-propagation neural network (BPNN), support vector machine (SVM), hidden Markov model (HMM), and Gaussian mixture model (GMM) that are used to design the classifier for speech and speaker detection
- PDF / 7,594,702 Bytes
- 72 Pages / 439.37 x 666.142 pts Page_size
- 46 Downloads / 243 Views
Pattern Recognition for Speech Detection
Abstract The supervised pattern recognition techniques such as back-propagation neural network (BPNN), support vector machine (SVM), hidden Markov model (HMM), and Gaussian mixture model (GMM) that are used to design the classifier for speech and speaker detection are described in this chapter. The unsupervised techniques such as fuzzy k-means algorithm and Kohonen self-organizing map (KSOM) are discussed in this chapter. The dimensionality reduction techniques such as principal component analysis (PCA), linear discriminant analysis (LDA), kernel LDA, and independent component analysis (ICA) are also discussed in this chapter. The techniques described in this chapter are illustrated using the MATLAB for better understanding.
1.1 Introduction Speech recognition involves identifying the isolated word from the corresponding speech signal. From the speech signal that corresponds to the particular word, one or more features such as linear predictive coefficients (LPC), Mel-frequency cepstral coefficients (MFCC) (refer Chap. 3) are collected and are arranged as the elements to form the vector. This is known as the feature extraction, and the corresponding vector is known as feature vector. Every element in the vector is known as the attributes. Suppose we need to design the digit classifier that classifies the word zer o to nine (digits) from the corresponding speech signal. About large number (several hundreds) of feature vectors corresponding to the individual digits are collected from the various speakers to design the speaker-independent classifier. Fifty percent of the collected data are used for designing the classifier. This is generally known as training phase. The remaining 50 % are used for testing the classifier. The collected feature vectors belonging to the identical digit form the cluster. The number of clusters (class) in this example is ten.
E. S. Gopi, Digital Speech Processing Using Matlab, Signals and Communication Technology, DOI: 10.1007/978-81-322-1677-3_1, © Springer India 2014
1
2
1 Pattern Recognition for Speech Detection
(a)
(b)
(c)
Fig. 1.1 Illustration of three types of multiclass classifier. The + in the figure indicates the region for which the corresponding boundary line equation gives positive value. a Type 1; b Type 2; c Type 3
The classifier identifies the separating boundary for the individual clusters. Consider the case, with the number of attributes equal to two. So the feature vectors are scattered in the 2D plane (X Y plane). The coordinates on the plane are represented as (x, y). The generalized equation of the linear boundary that separates the particular cluster is given as d(x, y) = ax + by + c = 0. Based on the formulation of the separating boundary for the multiclass classifier, the classifier is classified as Type 1, Type 2, Type 3 (refer Fig. 1.1) as described below. 1. Type 1 The boundary line that separates the cluster i with other feature vectors is represented as di . The number of decision boundary lines is equal to t
Data Loading...