Voice liveness detection under feature fusion and cross-environment scenario
- PDF / 1,901,003 Bytes
- 17 Pages / 439.642 x 666.49 pts Page_size
- 79 Downloads / 234 Views
Voice liveness detection under feature fusion and cross-environment scenario Sanjay Garg1 · Sapan H Mankad1 Received: 21 August 2019 / Revised: 22 May 2020 / Accepted: 29 June 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Detecting playback spoofing attacks in speaker verification system is a big challenge. Recent studies on ASVspoof challenges show that replay attacks are the most difficult to recognize. Reasonable performance is expected from such antispoofing systems to avoid malicious access attempts on voice biometrics enabled systems for possible commercial deployment. We present a study on filterbank based short-term cepstral features for liveness detection to counter replay spoofing attacks on speaker verification systems. These systems are evaluated on ASVspoof 2017 version 2.0 dataset. Experimental investigation is carried out on standalone and fused features to assess the performance of the antispoofing systems using spoofing detection equal error rate (EER). Improvement of 20.47% and 21.51% is obtained over baseline system using standalone and fused approaches, respectively. We also explore the impact of proposed static inverted Mel frequency cepstral coefficients (IMFCC) based system under mismatched conditions by training and testing it in different environments (with different background conditions) alongwith other systems. Results show that the proposed system outperforms other systems used in this study in all experiments. Keywords Speaker verification · Anti-spoofing · Countermeasures · Replay attacks · Liveness detection
1 Introduction Biometric authentication has recently gained great attention. The mechanism of passwordor token-based authentication has been replaced by biometric enabled systems in many places [12, 31]. Biometrics are natural traits which every person possesses, and they cannot be forgotten, misplaced or stolen. Systems equipped with face, retina, fingerprint or voice based biometric solutions are getting deployed at a rapid rate. Major research agencies, Sapan H Mankad
[email protected] Sanjay Garg [email protected] 1
CSE Department, Institute of Technology, Nirma University, Ahmedabad, India
Multimedia Tools and Applications
Fig. 1 Vulnerable points of attacks on a speaker verification system (after [22, 33])
institutes and companies1 are also concerned about the need of such systems. Nowadays, both computation power and the availability of sufficient data have made it possible to test several machine learning based algorithms on such data, and improve the performance so that such systems can be deployed for practical solutions. Voice, being the most obvious and easiest means for communication, is the first choice for such authentication systems. However, in earlier years, much work was not done in this field as majority of research was focused on fingerprint data.2 Automatic Speaker Verification (ASV) is the task of recognizing a person by analysing his voice signal. It has several applications including phone banki
Data Loading...