Detection of Various Speech Forgery Operations Based on Recurrent Neural Network
Most existed algorithms of speech forensics have been proposed to detect specific forgery operations. In realistic scenes, however, it is difficult to predict the type of the forgery. Since the suspicious speech might have been processed by some unknown f
- PDF / 1,395,881 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 111 Downloads / 331 Views
and Tingting Wu
College of Information Science and Engineering, Ningbo University, Ningbo 315211, China [email protected]
Abstract. Most existed algorithms of speech forensics have been proposed to detect specific forgery operations. In realistic scenes, however, it is difficult to predict the type of the forgery. Since the suspicious speech might have been processed by some unknown forgery operation, it will give a confusing result based on a classifier for a specific forgery operation. To this end, a forensic algorithm based on recurrent neural network (RNN) and linear frequency cepstrum coefficients (LFCC) is proposed to detect four common forgery operations. The LFCC with its derivative coefficients is determined as the forensic feature. An RNN frame with two-layer LSTM is designed with preliminary experiments. Extensive experiments on TIMIT and UME databases show that the detection accuracy for the intra-database evaluation can achieve about 99%, and the detection accuracy for the cross-database can achieve higher than 88%. Finally, compared with the previous algorithm, better performance is obtained by the proposed algorithm. Keywords: Forensics · Forgery operations · Recurrent neural network
1 Introduction Nowadays, speech recording can be easily forged by some audio software. It will cause a huge threat if we cannot make sure the speech is natural or maliciously modified. Specifically, it will bring an inestimable impact on society when the forged speech is used for news report, court evidence and other fields. In the past decades, digital speech forensics plays a crucial role on identifying the authenticity and integrity of speech recordings. Lots of works have been proposed. In order to detect the compression history of AMR audio, Luo [1] proposed a Stack Autoencoder (SAE) network for extracting the deep representations to classify the double compressed audios with a UBM-GMM classifier. Jing [2] present a detection method based on adaptive least squares and periodicity in the second derivative of an audio signal as a classification feature. For protecting text-dependent speaker verification systems from the spoofing attacks, Jakub [3] proposed an algorithm for detecting the replay attack audio. In [4], Galina use a high-level feature with a GMM classifier to against the synthetize audio in ASVspoof challenge. To detect the electronic disguised speech, Huang [5] proposed a forensic algorithm that adopted the SVM model with the Melfrequency Cepstral Coefficients (MFCC) statistical vectors as acoustic features, including the MFCC and its mean value and correlation coefficients. The experimental results © Springer Nature Singapore Pte Ltd. 2020 S. Yu et al. (Eds.): SPDE 2020, CCIS 1268, pp. 415–426, 2020. https://doi.org/10.1007/978-981-15-9129-7_29
416
D. Yan and T. Wu
show that their algorithm can achieve a high detection accuracy about 90%. In [6], Wang combined Linear Frequency Cepstrum Coefficient (LFCC) statistical moment and formant statistical moment as input features to detect electronic dis
Data Loading...