Manifold Learning-Based Feature Transformation for Phone Classification

This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A nu

PDF / 345,120 Bytes
10 Pages / 430 x 660 pts Page_size
102 Downloads / 254 Views

DOWNLOAD

REPORT

Abstract. This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classiﬁcation tasks on the TIMIT corpus. Classiﬁcation accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classiﬁcation accuracy than these baseline features. The best phone classiﬁcation accuracy in general is demonstrated by feature transformation with Isomap.

1

Introduction

Feature transformation is an important part of the speech recognition process and can be viewed as a two step procedure. Firstly, relevant information is extracted from short-time segments of the acoustic speech signal using a procedure such as Fourier analysis, cepstral analysis, or some other perceptually motivated analysis. The resulting D-dimensional parameter vectors are then transformed to a feature vector of lower dimensionality d (d ≤ D). The aim of this dimensionality reduction is to produce features which are concise low dimensional representations that retain the most discriminating information for the intended application and are thus more suitable for pattern classiﬁcation. Dimensionality reduction also decreases the computational cost associated with subsequent processing. Physiological constraints on the articulators limit the degrees of freedom of the speech production apparatus. As a result humans are only capable of producing sounds occupying a subspace of the entire acoustic space. Thus, speech data can be viewed as lying on or near a low dimensional manifold embedded in the M. Chetouani et al. (Eds.): NOLISP 2007, LNAI 4885, pp. 132–141, 2007. c Springer-Verlag Berlin Heidelberg 2007

Manifold Learning-Based Feature Transformation

133

original acoustic space. The underlying dimensionality of speech has been the subject of much previous research using many diﬀerent approaches including classical dimensionality reduction analysis [1], [2], nonlinear dynamical analysis [3], and manifold learning [4]. The consensus of this work is that some speech sounds, particularly voiced speech, are inherently low dimensional. Dimensionality reduction methods aim to discover such underlying low dimensional structure. These methods can be categorised as linear or nonlinear. Linear methods are limited to discovering the structure of data lying on or near a linear subspace of the high dimensional input space. Two of the most widely used li

Data Loading...

Manifold Learning-Based Feature Transformation for Phone Classification

Recommend Documents

Manifold-Preserving Common Subspace Factorization for Feature Matching

Joint feature and instance selection using manifold data criteria: application to image classification

Manifold Learning Technique for Remote Sensing Image Classification

Multipath feature recalibration DenseNet for image classification

Feature Redirection Network for Few-Shot Classification

Feature Normalized Knowledge Distillation for Image Classification

Embedding Propagation: Smoother Manifold for Few-Shot Classification

Gaussian bandwidth selection for manifold learning and classification

On classification of higher rank Anosov actions on compact manifold

Multi-label feature selection via feature manifold learning and sparsity regularization

Continuous wavelet transformation of seismic data for feature extraction

Multiview Detection with Feature Perspective Transformation