On the combination of data augmentation method and gated convolution model for building effective and robust intrusion d

  • PDF / 1,090,705 Bytes
  • 12 Pages / 595 x 791 pts Page_size
  • 22 Downloads / 132 Views

DOWNLOAD

REPORT


(2020) 3:23 Wang et al. Cybersecurity https://doi.org/10.1186/s42400-020-00063-5

RESEARCH

Open Access

On the combination of data augmentation method and gated convolution model for building effective and robust intrusion detection Yixiang Wang1† , Shaohua lv1† , Jiqiang Liu1 , Xiaolin Chang1*

and Jinqiang Wang2

Abstract Deep learning (DL) has exhibited its exceptional performance in fields like intrusion detection. Various augmentation methods have been proposed to improve data quality and eventually to enhance the performance of DL models. However, the classic augmentation methods cannot be applied to those DL models which exploit the system-call sequences to detect intrusion. Previously, the seq2seq model has been explored to augment system-call sequences. Following this work, we propose a gated convolutional neural network (GCNN) model to thoroughly extract the potential information of augmented sequences. Also, in order to enhance the model’s robustness, we adopt adversarial training to reduce the impact of adversarial examples on the model. Adversarial examples used in adversarial training are generated by the proposed adversarial sequence generation algorithm. The experimental results on different verified models show that GCNN model can better obtain the potential information of the augmented data and achieve the best performance. Furthermore, GCNN with adversarial training can enhance robustness significantly. Keywords: Data augmentation, Intrusion detection system, Machine learning algorithms, System call

Introduction An intrusion detection system (IDS) is a kind of active defense technique with the aim of resisting malware and sensitive activities. It mainly identifies malicious intrusions by monitoring network traffic or user behaviors. There are two types of detection systems, misuse-based and anomaly-based. The former type works by constructing a known attack pattern database and then identifying intrusion behaviors according to the pre-defined matching rules. The latter type focuses on normal behaviors, and when the system finds a behavior deviating from the pre-defined rules, it is determined to be an intrusion event. The ability to identify intrusion is a crucial *Correspondence: [email protected] † Yixiang Wang and Shaohua lv contributed equally to this work. 1 Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, 3 Shangyuancun, 100044 Beijing, China Full list of author information is available at the end of the article

factor in evaluating an intrusion detection system. However, current intrusion detection systems have some limitations. The misuse-based intrusion detection system needs a large number of attack pattern libraries and cannot identify unknown attacks, which will cause a high rate of false negatives. Due to the variability of user behavior habits, anomaly detection algorithms always have a high false-positive rate. In recent years, there has been a growing number of publications focusing on the analysis of system-call seque