Detection of Various Speech Forgery Operations Based on Recurrent Neural Network

Most existed algorithms of speech forensics have been proposed to detect specific forgery operations. In realistic scenes, however, it is difficult to predict the type of the forgery. Since the suspicious speech might have been processed by some unknown f

PDF / 1,395,881 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
111 Downloads / 355 Views

DOWNLOAD

REPORT

and Tingting Wu

College of Information Science and Engineering, Ningbo University, Ningbo 315211, China [email protected]

Abstract. Most existed algorithms of speech forensics have been proposed to detect specific forgery operations. In realistic scenes, however, it is difficult to predict the type of the forgery. Since the suspicious speech might have been processed by some unknown forgery operation, it will give a confusing result based on a classifier for a specific forgery operation. To this end, a forensic algorithm based on recurrent neural network (RNN) and linear frequency cepstrum coefficients (LFCC) is proposed to detect four common forgery operations. The LFCC with its derivative coefficients is determined as the forensic feature. An RNN frame with two-layer LSTM is designed with preliminary experiments. Extensive experiments on TIMIT and UME databases show that the detection accuracy for the intra-database evaluation can achieve about 99%, and the detection accuracy for the cross-database can achieve higher than 88%. Finally, compared with the previous algorithm, better performance is obtained by the proposed algorithm. Keywords: Forensics · Forgery operations · Recurrent neural network

1 Introduction Nowadays, speech recording can be easily forged by some audio software. It will cause a huge threat if we cannot make sure the speech is natural or maliciously modified. Specifically, it will bring an inestimable impact on society when the forged speech is used for news report, court evidence and other fields. In the past decades, digital speech forensics plays a crucial role on identifying the authenticity and integrity of speech recordings. Lots of works have been proposed. In order to detect the compression history of AMR audio, Luo [1] proposed a Stack Autoencoder (SAE) network for extracting the deep representations to classify the double compressed audios with a UBM-GMM classifier. Jing [2] present a detection method based on adaptive least squares and periodicity in the second derivative of an audio signal as a classification feature. For protecting text-dependent speaker verification systems from the spoofing attacks, Jakub [3] proposed an algorithm for detecting the replay attack audio. In [4], Galina use a high-level feature with a GMM classifier to against the synthetize audio in ASVspoof challenge. To detect the electronic disguised speech, Huang [5] proposed a forensic algorithm that adopted the SVM model with the Melfrequency Cepstral Coefficients (MFCC) statistical vectors as acoustic features, including the MFCC and its mean value and correlation coefficients. The experimental results © Springer Nature Singapore Pte Ltd. 2020 S. Yu et al. (Eds.): SPDE 2020, CCIS 1268, pp. 415–426, 2020. https://doi.org/10.1007/978-981-15-9129-7_29

416

D. Yan and T. Wu

show that their algorithm can achieve a high detection accuracy about 90%. In [6], Wang combined Linear Frequency Cepstrum Coefficient (LFCC) statistical moment and formant statistical moment as input features to detect electronic dis

Data Loading...

Detection of Various Speech Forgery Operations Based on Recurrent Neural Network

Recommend Documents

Bay Number Recognition Based on Deep Convolutional Recurrent Neural Network

Temporal Consistency Based Deep Face Forgery Detection Network

Accurate Scene Text Recognition Based on Recurrent Neural Network

A Novel Isolated Speech Recognition Method Based on Neural Network

Air Quality Index Prediction Based on Deep Recurrent Neural Network

DeepPhish: Automated Phishing Detection Using Recurrent Neural Network

Fruit Detection Using Recurrent Convolutional Neural Network (RCNN)

Recurrent neural network with integrated wavelet based denoising unit

EEG-based emotion recognition using 4D convolutional recurrent neural network

Diagonal Recurrent Neural Network Based Prediction Model for Sales Forecasting

Multi-GPU Based Recurrent Neural Network Language Model Training

Apple Defect Detection Based on Deep Convolutional Neural Network