Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

PDF / 1,622,463 Bytes
12 Pages / 600 x 792 pts Page_size
0 Downloads / 211 Views

Accuracy of MFCC-Based Speaker Recognition in Series 60 Device Juhani Saastamoinen Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected]

Evgeny Karpov Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected]

¨ Ville Hautamaki Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected]

¨ Pasi Franti Department of Computer Science, University of Joensuu, P.O. Box 111, 80101 Joensuu, Finland Email: [email protected] Received 1 October 2004; Revised 14 June 2005; Recommended for Publication by Markus Rupp A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its eﬀect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60. Keywords and phrases: speaker identification, fixed point arithmetic, round-oﬀ error, MFCC, FFT, Symbian.

1.

INTRODUCTION

The speech research and application development deal with three main problems: speech synthesis, speech recognition, and speaker recognition. We are working in a speech technology project, where one of the main goals is to integrate automatic speaker recognition technique into Series 60 mobile phones. In speaker recognition, we have a recorded speech sample and we try to determine to whom the voice belongs. This study involves closed-set speaker identification, where an unknown sample is compared to previously trained voice models in a speaker database. The speaker identification is a speech classification problem. Based on the training material, we create speakerspecific voice models, which divide the feature space into distinct classes. Unknown speech is transformed to a sequence of features, which are scored against voice models. That speaker is identified and his model has the best overall match

with the input features. There are many ways to choose the used features and how they are used. Our research team has studied, for example, how the feature design [1], or the concurrent use of multiple features [2], aﬀects the recognition accuracy. Our speaker identification method is a generic automatic learning classification with mel-frequency cepstral coeﬃcient (MFCC) features. The classification algorithm that we use in this study is a common unsupervised vector quantizer. We have ported the identification system to a Series 60 Symbian mobile phone. In this study, we introduce the Series 60 pl

Data Loading...

Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

Recommend Documents

Fundamentals of Speaker Recognition

Audio-Visual Speaker Recognition

Speaker Recognition Engine

Speaker Recognition, Standardization

Forensic Speaker Recognition

Speaker Recognition, Overview

Visual-dynamic Speaker Recognition

NIST SREs (Speaker Recognition Evaluations)

Speaker Recognition, One to One

Usage of DNN in Speaker Recognition: Advantages and Problems

DVDGCN: Modeling Both Context-Static and Speaker-Dynamic Graph for Emotion Recognition in Multi-speaker Conversations

Biomimetic multi-resolution analysis for robust speaker recognition