Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

Over the past several years, Gaussian mixtures models have been the dominant approach for modeling in text-independent speaker recognition field. But the recognition accuracy for these models declines when utterances’ length becomes short. Presently Mel-f

PDF / 412,818 Bytes
12 Pages / 439.37 x 666.14 pts Page_size
11 Downloads / 320 Views

DOWNLOAD

REPORT

)

1

2

School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China [email protected], [email protected] School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou, China [email protected]

Abstract. Over the past several years, Gaussian mixtures models have been the dominant approach for modeling in text-independent speaker recognition ﬁeld. But the recognition accuracy for these models declines when utterances’ length becomes short. Presently Mel-frequency cepstral coeﬃcients are generally used to characterize the properties of the vocal tract and widely applied in speech recognition. In addition, prosodic features, such as pitch and formant, are gener‐ ally considered to describe the glottal characteristics. However, the eﬃciency of those approaches remain unsatisfactory. In text-dependent short utterances speaker veriﬁcation systems, prosodic features can assist to improve the recog‐ nition result theoretically. In order to optimize the performance of speaker veri‐ ﬁcation systems under the framework of adapted GMM-UBM, we adopt a variant speaker veriﬁcation system based on prosodic features, in which a dual-judgmentmechanism is used in order to integrate vocal tract features with prosodic features. Experimental results showed that the new speech recognition system led a better consequence. Keywords: Speaker veriﬁcation · Text dependent · Prosodic features · Dual judgment mechanism

1

Introduction

As one of the most natural biometric identiﬁcation methods, speaker recognition has great potential in the ﬁeld of convergent key [1, 2], ordinary digital signatures, biometric key [3], and so on. Speaker recognition technology [4], aiming to recognize the speaker identities automatically, is becoming more and more attractive. In the meantime, Short utterance speaker recognition (SUSR) has been hotspot. GMM-UBM and GMM-SVM [5, 6], based on clustering and subspace, are two popular speaker recognition methods. In systems based on such structures, [7] illustrates the performance change with diﬀerent valid test utterance lengths on the NIST SRE 2005 database, where it can be seen that the Equal Error Rate increases sharply when the test utterances become shorter. © Springer Science+Business Media Singapore 2016 K. Li et al. (Eds.): ISICA 2015, CCIS 575, pp. 541–552, 2016. DOI: 10.1007/978-981-10-0356-1_57

542

J. Zhang et al.

In order to solve the problem of large data requirements, research has lead to Joint Factor Analysis (JFA), Support Vector Machine (SVM) and i-vector based technologies. The factor analysis subspace estimation and the i-vector method introduced in [8, 9] decrease the number of redundant model parameters to develop more accurate speaker models. Some methods try to improve the performance by selecting segments with higher discriminability on speaker characteristics. In other works performing short utterance speaker recognition, such as [10], dimension decoupled GMM is applied. Training and testing with 10 s of speech on variations of GMM and SVM have

Data Loading...

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

Recommend Documents

Robust features for text-independent speaker recognition with short utterances

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Fea

Speaker Features

Audio-Visual Speaker Recognition

Speaker Recognition Engine

Speaker Recognition, Standardization

Forensic Speaker Recognition

Fundamentals of Speaker Recognition

Speaker Recognition, Overview

Visual-dynamic Speaker Recognition

NIST SREs (Speaker Recognition Evaluations)

Accuracy of MFCC-Based Speaker Recognition in Series 60 Device