Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification

PDF / 889,174 Bytes
14 Pages / 600 x 792 pts Page_size
34 Downloads / 169 Views

Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification Man-Wai Mak Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Email: [email protected]

Chi-Leung Tsang Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Email: [email protected]

Sun-Yuan Kung Department of Electrical Engineering, Princeton University, NJ 08544, USA Email: [email protected] ‘ Received 7 October 2002; Revised 20 June 2003 The performance of telephone-based speaker verification systems can be severely degraded by linear and nonlinear acoustic distortion caused by telephone handsets. This paper proposes to combine a handset selector with stochastic feature transformation to reduce the distortion. Specifically, a Gaussian mixture model (GMM)-based handset selector is trained to identify the most likely handset used by the claimants, and then handset-specific stochastic feature transformations are applied to the distorted feature vectors. This paper also proposes a divergence-based handset selector with out-of-handset (OOH) rejection capability to identify the “unseen” handsets. This is achieved by measuring the Jensen diﬀerence between the selector’s output and a constant vector with identical elements. The resulting handset selector is combined with the proposed feature transformation technique for telephone-based speaker verification. Experimental results based on 150 speakers of the HTIMIT corpus show that the handset selector, either with or without OOH rejection capability, is able to identify the “seen” handsets accurately (98.3% in both cases). Results also demonstrate that feature transformation performs significantly better than the classical cepstral mean normalization approach. Finally, by using the transformation parameters of the seen handsets to transform the utterances with correctly identified handsets and processing those utterances with unseen handsets by cepstral mean subtraction (CMS), verification error rates are reduced significantly (from 12.41% to 6.59% on average). Keywords and phrases: robust speaker verification, feature transformation, divergence, handset distortion, EM algorithm.

1.

INTRODUCTION

Recently, speaker verification over the telephone has attracted much attention, primarily because of the proliferation of electronic banking and electronic commerce. Although substantial progress in telephone-based speaker verification has been made, two issues have hindered the pace of development. First, sensitivity to handset variations remains a challenge: transducer variability could result in acoustic mismatches between the speech data gathered from diﬀerent handsets. Second, the accuracy of handset identification is a concern: a wrong identification for the handset used by the

speaker can result in wrong handset compensation. To enhance the pract

Data Loading...

Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification

Recommend Documents

Speaker Verification

Deep Discriminative Embedding with Ranked Weight for Speaker Verification

Score Normalization of X-Vector Speaker Verification System for Short-Duration Speaker Verification Challenge

Robust features for text-independent speaker recognition with short utterances

Speaker Identification and Verification, SIV

Speaker Verification Method Using HTM for Security System

Exploring Algorithmic Fairness in Deep Speaker Verification

Robust and stochastic viability

Biomimetic multi-resolution analysis for robust speaker recognition

Multiview Detection with Feature Perspective Transformation

Robust Portfolio Optimization with Multi-Factor Stochastic Volatility

Robust dialog state tracker with contextual-feature augmentation