A Score-Level Solution to Speaker Verification Using UBM Pooling and Adaptive Cohort Selection

In an environment that is highly unpredictable in nature, a speaker verification system needs a good background model to carry out the verification task reliably. In this paper, a 1024-component UBM is created by pooling a noisy speech UBM and clean speec

PDF / 205,377 Bytes
8 Pages / 439.37 x 666.142 pts Page_size
64 Downloads / 333 Views

DOWNLOAD

REPORT

Abstract In an environment that is highly unpredictable in nature, a speaker veriﬁcation system needs a good background model to carry out the veriﬁcation task reliably. In this paper, a 1024-component UBM is created by pooling a noisy speech UBM and clean speech UBM. This pooled UBM is used for speaker adaptation as well as for speaker testing. Experimental results have shown minor improvement with pooled UBM as compared to baseline UBM. In addition to this, a score-level solution is proposed by means of cohort model selection using HT-normalization to reduce undesirable variation arising from acoustically mismatched devices and environment. For cohort selection a simple distance metric based on similarity modeling of each client speaker is used. The normalization parameters computed over a group of speakers (cohort) having some common characteristics are used in the ﬁnal score calculation. Experiments on a noisy corpus has shown reasonable improvements in performance, when normalization parameters were taken from a cohort than from a general group. Experiments have shown a recognition rate of 90.58 and 87.64% for matched handset type in ofﬁce and roadside environment respectively. Keywords Score normalization GMM-UBM

Cohort T-normalization UBM pooling

1 Introduction In general, a speaker veriﬁcation system models the alternative hypothesis by a universe of probable background imposter speakers for optimal likelihood ratio test. Such a universe of background imposter speakers is called a Universal Background Model (UBM) [1] or a World Model. The UBM plays vital role in the veriﬁcation task by minimizing the non-speaker-related variations and thus helps in a stable P. Das (&) Assam Don Bosco University, Guwahati, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 A. Kalam et al. (eds.), Advances in Electronics, Communication and Computing, Lecture Notes in Electrical Engineering 443, https://doi.org/10.1007/978-981-10-4765-7_49

459

460

P. Das

decision. One of the major approaches in modeling a UBM is to collect speech from a number of speaker’s representative of the population of speakers. A single model is then trained from that population. The focus of this approach is mainly on the composition, selection of the speakers, and the speech used to train the UBM [2, 3]. To improve the effectiveness of detection threshold in a speaker veriﬁcation system, a technique called score normalization is commonly used. In this technique the output scores are transformed by aligning with respect to the score distribution of individual speaker models. Score normalization is also minimizes the speaker-dependent and speaker-independent changes in the signal. For example, Z-norm minimizes speaker differences of imposter scores distributions, whereas H-norm minimizes bias effects resulting from different microphones and channels. One of the most widely used methods for score normalization at the time of testing is Test-normalization or T-norm [4]. In this method, a test utterance is scor

Data Loading...

A Score-Level Solution to Speaker Verification Using UBM Pooling and Adaptive Cohort Selection

Recommend Documents

Speaker Verification

Speaker Identification and Verification, SIV

Speaker Verification Method Using HTM for Security System

Score Normalization of X-Vector Speaker Verification System for Short-Duration Speaker Verification Challenge

Exploring Algorithmic Fairness in Deep Speaker Verification

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

Automatic Clamping Selection in Process Planning Using Tolerance Verification Algorithms

Deep Discriminative Embedding with Ranked Weight for Speaker Verification

An Approach to Cohort Selection in Cloud for Face Recognition

Mitigate the reverberation effect on the speaker verification performance using different methods

Risk Pooling

Verification and Adjudication of Health Outcomes in Prospective Cohort Studies