A Score-Level Solution to Speaker Verification Using UBM Pooling and Adaptive Cohort Selection

In an environment that is highly unpredictable in nature, a speaker verification system needs a good background model to carry out the verification task reliably. In this paper, a 1024-component UBM is created by pooling a noisy speech UBM and clean speec

  • PDF / 205,377 Bytes
  • 8 Pages / 439.37 x 666.142 pts Page_size
  • 64 Downloads / 224 Views

DOWNLOAD

REPORT


Abstract In an environment that is highly unpredictable in nature, a speaker verification system needs a good background model to carry out the verification task reliably. In this paper, a 1024-component UBM is created by pooling a noisy speech UBM and clean speech UBM. This pooled UBM is used for speaker adaptation as well as for speaker testing. Experimental results have shown minor improvement with pooled UBM as compared to baseline UBM. In addition to this, a score-level solution is proposed by means of cohort model selection using HT-normalization to reduce undesirable variation arising from acoustically mismatched devices and environment. For cohort selection a simple distance metric based on similarity modeling of each client speaker is used. The normalization parameters computed over a group of speakers (cohort) having some common characteristics are used in the final score calculation. Experiments on a noisy corpus has shown reasonable improvements in performance, when normalization parameters were taken from a cohort than from a general group. Experiments have shown a recognition rate of 90.58 and 87.64% for matched handset type in office and roadside environment respectively. Keywords Score normalization GMM-UBM

 Cohort  T-normalization  UBM pooling

1 Introduction In general, a speaker verification system models the alternative hypothesis by a universe of probable background imposter speakers for optimal likelihood ratio test. Such a universe of background imposter speakers is called a Universal Background Model (UBM) [1] or a World Model. The UBM plays vital role in the verification task by minimizing the non-speaker-related variations and thus helps in a stable P. Das (&) Assam Don Bosco University, Guwahati, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 A. Kalam et al. (eds.), Advances in Electronics, Communication and Computing, Lecture Notes in Electrical Engineering 443, https://doi.org/10.1007/978-981-10-4765-7_49

459

460

P. Das

decision. One of the major approaches in modeling a UBM is to collect speech from a number of speaker’s representative of the population of speakers. A single model is then trained from that population. The focus of this approach is mainly on the composition, selection of the speakers, and the speech used to train the UBM [2, 3]. To improve the effectiveness of detection threshold in a speaker verification system, a technique called score normalization is commonly used. In this technique the output scores are transformed by aligning with respect to the score distribution of individual speaker models. Score normalization is also minimizes the speaker-dependent and speaker-independent changes in the signal. For example, Z-norm minimizes speaker differences of imposter scores distributions, whereas H-norm minimizes bias effects resulting from different microphones and channels. One of the most widely used methods for score normalization at the time of testing is Test-normalization or T-norm [4]. In this method, a test utterance is scor