Evolutionary Splines for Cepstral Filterbank Optimization in Phoneme Classification

PDF / 1,798,978 Bytes
14 Pages / 600.05 x 792 pts Page_size
118 Downloads / 264 Views

Research Article Evolutionary Splines for Cepstral Filterbank Optimization in Phoneme Classification Leandro D. Vignolo,1 Hugo L. Rufiner,1 Diego H. Milone,1 and John C. Goddard2 1 Research

Center for Signals, Systems and Computational Intelligence, Department of Informatics, National University of Litoral, CONICET, Santa Fe, 3000, Argentina 2 Departamento de Ingenier´ ıa El´ectrica, Universidad Aut´onoma Metropolitana, Unidad Iztapalapa, Mexico D.F., 09340, Mexico Correspondence should be addressed to Leandro D. Vignolo, [email protected] Received 14 July 2010; Revised 29 October 2010; Accepted 24 December 2010 Academic Editor: Raviraj S. Adve Copyright © 2011 Leandro D. Vignolo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Mel-frequency cepstral coeﬃcients have long been the most widely used type of speech representation. They were introduced to incorporate biologically inspired characteristics into artificial speech recognizers. Recently, the introduction of new alternatives to the classic mel-scaled filterbank has led to improvements in the performance of phoneme recognition in adverse conditions. In this work we propose a new bioinspired approach for the optimization of the filterbanks, in order to find a robust speech representation. Our approach—which relies on evolutionary algorithms—reduces the number of parameters to optimize by using spline functions to shape the filterbanks. The success rates of a phoneme classifier based on hidden Markov models are used as the fitness measure, evaluated over the well-known TIMIT database. The results show that the proposed method is able to find optimized filterbanks for phoneme recognition, which significantly increases the robustness in adverse conditions.

1. Introduction Most current speech recognizers rely on the traditional mel-frequency cepstral coeﬃcients (MFCC) [1] for the feature extraction phase. This representation is biologically motivated and introduces the use of a psychoacoustic scale to mimic the frequency response in the human ear. However, as the entire auditory system is complex and not yet fully understood, the shape of the true optimal filterbank for automatic recognition is not known. Moreover, the recognition performance of automatic systems degrades when speech signals are contaminated with noise. This has motivated the development of alternative speech representations, and many of them consist in modifications to the mel-scaled filterbank, for which the number of filters has been empirically set to diﬀerent values [2]. For example, Skowronski and Harris [3, 4] proposed a novel scheme for determining filter bandwidth and reported significant recognition improvements compared to those using the MFCC traditional features. Other approaches follow a common strategy which consists in optimizing a

speech representation so that phoneme discrimination is maximized for a

Data Loading...

Evolutionary Splines for Cepstral Filterbank Optimization in Phoneme Classification

Recommend Documents

Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization

Recognition of emotion from speech using evolutionary cepstral coefficients

Evolutionary Optimization

Phoneme

EVOLUTIONARY ALGORITHMS IN COMBINATORIAL OPTIMIZATION

Constraint-Handling in Evolutionary Optimization

Evolutionary Constrained Optimization

Neuromemetic Evolutionary Optimization

Evolutionary Computation for Dynamic Optimization Problems

Quantum-Inspired Evolutionary Algorithm for Numerical Optimization

Splines

Evolutionary Computation in Combinatorial Optimization 19th European