Accuracy of MFCC-Based Speaker Recognition in Series 60 Device
- PDF / 1,622,463 Bytes
- 12 Pages / 600 x 792 pts Page_size
- 0 Downloads / 196 Views
Accuracy of MFCC-Based Speaker Recognition in Series 60 Device Juhani Saastamoinen Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected]
Evgeny Karpov Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected]
¨ Ville Hautamaki Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected]
¨ Pasi Franti Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected] Received 1 October 2004; Revised 14 June 2005; Recommended for Publication by Markus Rupp A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its effect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60. Keywords and phrases: speaker identification, fixed point arithmetic, round-off error, MFCC, FFT, Symbian.
1.
INTRODUCTION
The speech research and application development deal with three main problems: speech synthesis, speech recognition, and speaker recognition. We are working in a speech technology project, where one of the main goals is to integrate automatic speaker recognition technique into Series 60 mobile phones. In speaker recognition, we have a recorded speech sample and we try to determine to whom the voice belongs. This study involves closed-set speaker identification, where an unknown sample is compared to previously trained voice models in a speaker database. The speaker identification is a speech classification problem. Based on the training material, we create speakerspecific voice models, which divide the feature space into distinct classes. Unknown speech is transformed to a sequence of features, which are scored against voice models. That speaker is identified and his model has the best overall match
with the input features. There are many ways to choose the used features and how they are used. Our research team has studied, for example, how the feature design [1], or the concurrent use of multiple features [2], affects the recognition accuracy. Our speaker identification method is a generic automatic learning classification with mel-frequency cepstral coefficient (MFCC) features. The classification algorithm that we use in this study is a common unsupervised vector quantizer. We have ported the identification system to a Series 60 Symbian mobile phone. In this study, we introduce the Series 60 pl
Data Loading...