Adversarial Examples Attack and Countermeasure for Speech Recognition System: A Survey
Speech recognition technology is affecting and changing the current human-computer interaction profoundly. Due to the remarkable progress of deep learning, the performance of the Automatic Speech Recognition (ASR) system has also increased significantly.
- PDF / 1,530,567 Bytes
- 26 Pages / 439.37 x 666.142 pts Page_size
- 89 Downloads / 186 Views
Abstract. Speech recognition technology is affecting and changing the current human-computer interaction profoundly. Due to the remarkable progress of deep learning, the performance of the Automatic Speech Recognition (ASR) system has also increased significantly. As the core component of the speech assistant in the smartphone or other smart devices, ASR receives speech and responds accordingly, which allows us to control and interact with those devices remotely. However, speech adversarial samples where crafted by adding tiny perturbation to original speech, which can make the ASR system to generate malicious instructions while imperceptual to humans. This new attack brings several potential severe security risks to the deep-learning-based ASR system. In this paper, we provide a systematic survey on the speech adversarial examples. We first proposed a taxonomy of existing adversarial examples. Next, we give a brief introduction of existing adversarial examples for the acoustic system, especially for the ASR system, and summarize several major methods of generating the speech adversarial examples. Finally, after elaborating on the existing countermeasures of adversarial examples, we discuss the current challenges and countermeasures against speech adversarial examples. We also give several promising research directions on both making the attack constructing more realistic and the acoustic system more robust, respectively. Keywords: Speech adversarial examples · Speech recognition systems · Adversarial defense · Deep learning
1
Introduction
Deep learning has significantly improved the speech recognition system, which makes the end-to-end ASR system more achievable. ASR system receives the speech and interprets it to the corresponding command, which allows people to Supported by the National Natural Science Foundation of China (Grant No. U1736215, 61672302, 61901237), Zhejiang Natural Science Foundation (Grant No. LY20F020010, LY17F020010), K.C. Wong Magna Fund in Ningbo University. c Springer Nature Singapore Pte Ltd. 2020 S. Yu et al. (Eds.): SPDE 2020, CCIS 1268, pp. 443–468, 2020. https://doi.org/10.1007/978-981-15-9129-7_31
444
D. Wang et al.
control the systems remotely. With the convenience and feasibility, the ASR systems have been extensively applied in various smartphone and home equipment. Smart devices are affecting and changing the human-machine interaction way, i.e., people control the smart home equipment via speaking remotely. Despite its convenience and development, the ASR system have been found that existing potential security risks yet. Recently, the deep neural networks (DNNs) based ASR systems has been demonstrated that vulnerability to adversarial example, crafting carefully by adding peculiar noise to normal speech. However, adversarial attacks have been extensively investigated in the image domain, i.e., image classification, image segmentation, object detection, etc. In contrast, there are fewer investigations for the speech adversarial attack.
Fig. 1. An illustration of the adversarial
Data Loading...