Adversarial Examples Attack and Countermeasure for Speech Recognition System: A Survey

Speech recognition technology is affecting and changing the current human-computer interaction profoundly. Due to the remarkable progress of deep learning, the performance of the Automatic Speech Recognition (ASR) system has also increased significantly.

PDF / 1,530,567 Bytes
26 Pages / 439.37 x 666.142 pts Page_size
89 Downloads / 266 Views

DOWNLOAD

REPORT

Abstract. Speech recognition technology is aﬀecting and changing the current human-computer interaction profoundly. Due to the remarkable progress of deep learning, the performance of the Automatic Speech Recognition (ASR) system has also increased signiﬁcantly. As the core component of the speech assistant in the smartphone or other smart devices, ASR receives speech and responds accordingly, which allows us to control and interact with those devices remotely. However, speech adversarial samples where crafted by adding tiny perturbation to original speech, which can make the ASR system to generate malicious instructions while imperceptual to humans. This new attack brings several potential severe security risks to the deep-learning-based ASR system. In this paper, we provide a systematic survey on the speech adversarial examples. We ﬁrst proposed a taxonomy of existing adversarial examples. Next, we give a brief introduction of existing adversarial examples for the acoustic system, especially for the ASR system, and summarize several major methods of generating the speech adversarial examples. Finally, after elaborating on the existing countermeasures of adversarial examples, we discuss the current challenges and countermeasures against speech adversarial examples. We also give several promising research directions on both making the attack constructing more realistic and the acoustic system more robust, respectively. Keywords: Speech adversarial examples · Speech recognition systems · Adversarial defense · Deep learning

1

Introduction

Deep learning has signiﬁcantly improved the speech recognition system, which makes the end-to-end ASR system more achievable. ASR system receives the speech and interprets it to the corresponding command, which allows people to Supported by the National Natural Science Foundation of China (Grant No. U1736215, 61672302, 61901237), Zhejiang Natural Science Foundation (Grant No. LY20F020010, LY17F020010), K.C. Wong Magna Fund in Ningbo University. c Springer Nature Singapore Pte Ltd. 2020 S. Yu et al. (Eds.): SPDE 2020, CCIS 1268, pp. 443–468, 2020. https://doi.org/10.1007/978-981-15-9129-7_31

444

D. Wang et al.

control the systems remotely. With the convenience and feasibility, the ASR systems have been extensively applied in various smartphone and home equipment. Smart devices are aﬀecting and changing the human-machine interaction way, i.e., people control the smart home equipment via speaking remotely. Despite its convenience and development, the ASR system have been found that existing potential security risks yet. Recently, the deep neural networks (DNNs) based ASR systems has been demonstrated that vulnerability to adversarial example, crafting carefully by adding peculiar noise to normal speech. However, adversarial attacks have been extensively investigated in the image domain, i.e., image classiﬁcation, image segmentation, object detection, etc. In contrast, there are fewer investigations for the speech adversarial attack.

Fig. 1. An illustration of the adversarial

Data Loading...

Adversarial Examples Attack and Countermeasure for Speech Recognition System: A Survey

Recommend Documents

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

Automatic speech recognition: a survey

Non-norm-bounded Attack for Generating Adversarial Examples

Timing Attack on Random Forests for Generating Adversarial Examples

Design and Implementation of a SoPC System for Speech Recognition

A Feature-Based Detection System of Adversarial Sample Attack

Adversarial Ranking Attack and Defense

Facial Expression Recognition System (FERS): A Survey

Isolated Word Automatic Speech Recognition System

A Method for Resisting Adversarial Attack on Time Series Classification Model in IoT System

Sparse Representations for Speech Recognition

Pattern Recognition for Speech Detection