Manifold Learning-Based Feature Transformation for Phone Classification
This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A nu
- PDF / 345,120 Bytes
- 10 Pages / 430 x 660 pts Page_size
- 102 Downloads / 207 Views
Abstract. This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.
1
Introduction
Feature transformation is an important part of the speech recognition process and can be viewed as a two step procedure. Firstly, relevant information is extracted from short-time segments of the acoustic speech signal using a procedure such as Fourier analysis, cepstral analysis, or some other perceptually motivated analysis. The resulting D-dimensional parameter vectors are then transformed to a feature vector of lower dimensionality d (d ≤ D). The aim of this dimensionality reduction is to produce features which are concise low dimensional representations that retain the most discriminating information for the intended application and are thus more suitable for pattern classification. Dimensionality reduction also decreases the computational cost associated with subsequent processing. Physiological constraints on the articulators limit the degrees of freedom of the speech production apparatus. As a result humans are only capable of producing sounds occupying a subspace of the entire acoustic space. Thus, speech data can be viewed as lying on or near a low dimensional manifold embedded in the M. Chetouani et al. (Eds.): NOLISP 2007, LNAI 4885, pp. 132–141, 2007. c Springer-Verlag Berlin Heidelberg 2007
Manifold Learning-Based Feature Transformation
133
original acoustic space. The underlying dimensionality of speech has been the subject of much previous research using many different approaches including classical dimensionality reduction analysis [1], [2], nonlinear dynamical analysis [3], and manifold learning [4]. The consensus of this work is that some speech sounds, particularly voiced speech, are inherently low dimensional. Dimensionality reduction methods aim to discover such underlying low dimensional structure. These methods can be categorised as linear or nonlinear. Linear methods are limited to discovering the structure of data lying on or near a linear subspace of the high dimensional input space. Two of the most widely used li
Data Loading...