On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems
- PDF / 1,894,126 Bytes
- 15 Pages / 595.276 x 790.866 pts Page_size
- 112 Downloads / 209 Views
REGULAR PAPER
On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems Sapan H. Mankad1 · Sanjay Garg1 Received: 19 January 2020 / Accepted: 14 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Automatic speaker verification (ASV) systems have maximum threat from replay spoofing attacks. High frequency regions of the underlying audio signal exhibit the phenomenon about their presence. It is therefore useful to decompose the underlying audio signal into frequency bands or regions for possible analysis. In this paper, an empirical mode decomposition (EMD)based replay spoofing detection system is presented. Using EMD, each signal is decomposed into several monotonic intrinsic mode functions (IMFs). The signal is reconstructed and represented using one or more subsets of these IMFs by performing different combinations for spoofing detection. Results on ASVspoof 2017 version 2.0 and AVspoof benchmark replay attack datasets indicate that there is a potential in initial IMFs to carry replay attack patterns, and that is sufficient rather than processing the entire signal. The proposed approach can also serve as a preprocessing technique by employing dimension reduction strategy. Cross-corpus experiments on the systems indicate the limitations of ASV antispoofing systems due to mismatched conditions. Keywords Automatic speaker verification · Replay spoofing · Antispoofing · Empirical mode decomposition · Countermeasures
1 Introduction Automatic speaker verification (ASV) systems provide biometric solutions for voice-based person authentication. A typical ASV system accepts a claimed voice signal as input, and predicts the class associated with it by comparing its model with the corresponding claimed speaker’s stored pattern. Thus, ASV is a 1:1 comparison task wherein a given input utterance is compared with the claimed speaker’s model, and a decision is made for accepting or rejecting the claim using a pre-defined threshold. ASV systems have been of interest to researchers since more than three decades [1–8]. However, commercial deployment of such systems is still a long way to go due to vulnerabilities and limitations of such systems in diverse con-
B
Sanjay Garg [email protected] Sapan H. Mankad [email protected]
1
CSE Department, Institute of Technology, Nirma University, Ahmedabad, India
ditions.1 Spoofing attacks are the most common and difficult challenge to ASV systems [9–11]. Recent ASVspoof challenges [12,13] have started addressing these issues but the countermeasures research has not yet matured. Generalized countermeasures is yet another hindrance to deployment of such systems. Systems effective for a specific attack may fail on other types of attacks, and this has been proved also [14]. Hence, there is a need for robust generalized systems which can withstand any type of previously unseen attacks too. Another challenge to ASV systems is handling noise in practical scenarios. One of the objectives of the
Data Loading...