Mitigate the reverberation effect on the speaker verification performance using different methods

  • PDF / 1,827,417 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 74 Downloads / 239 Views

DOWNLOAD

REPORT


Mitigate the reverberation effect on the speaker verification performance using different methods Khamis A. Al‑Karawi1  Received: 9 November 2019 / Accepted: 7 November 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Speech signals recorded in far-field or with a far receiver typically comprise additive noise and reverberation, which cause degradation and distortion in the reliability and intelligibility of speech signal, and the recognition performance of speaker recognition systems, with severe consequences in a wide range of real applications. Channel equalization, i.e. the removal or reduction or other cleaning methods of the channel effects, to some extent, mitigates the mismatching problem at the cost of added distortions to the vulnerable speech signal themselves, and therefore, its effectiveness is limited. Recent research indicates that a new speaker feature, gammatone frequency cepstral coefficients (GFCC), exhibits superior noise and reverberation robustness than other features. This paper proposed two methods to combat the effect of reverberation on speaker verification performance. The first method is using GFCC features as a robust feature to alleviate the effect of reverberation on system performance. While the second method is using multi training to combat the reverberation effect. Speaker verification experiments in the artificial and real reverberant conditions show the efficiency of the proposed methods in terms of decreased equal error rate EER and detection error trade-off DET. Keywords  MFCC · GFCC · Reverberate · Robustness · Speaker verification

1 Introduction Speaker recognition can attain high accuracy in controlled acoustic conditions, offering a theoretically confident means to authenticate or recognize speakers. In a real-world application, speech signals are acquired in diverse acoustic environments, with the presence of various background noises and reverberation (Al-Karawi et al. 2015). Reverberation and noise represent the features of the acoustic transmission channel between the mouths of a talker to the microphone. The room acoustics, it has been well-known that the Room Impulse Response (RIR) defines entirely the reverberation properties of any enclosure for a specific source-receiver position and that it comprises of the direct sound signal, an early reflections part affecting mainly the signal’s timbre, being perceived as coloration and a late part producing a decaying noise-like effect (Rossing 2007). The early reflections combined with the direct signal are usually

* Khamis A. Al‑Karawi [email protected] 1



Present Address: University of Diyala, Baqubah, Diyala, Iraq

beneficial for the Speaker Recognition (Failed 2010), while late reverberation is the principal cause for the ASR degradation (Petrick et al. 2007). Many methods to mitigate the impact of reverberation have been described, e.g. (Zhao et al. 2014). To address this issue, several efforts, to name a few, Microphone arrays (González-Rodríguez et al. 1996) to decrease room