Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems
Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based compu
- PDF / 3,140,705 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 23 Downloads / 174 Views
Abstract. Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data. Keywords: Cloud computing Imbalanced dataset
Data mining Intrusion detection
1 Introduction Due to cloud services and operational models, and the technologies that are used to enable these services, organisations will face new risks and threats. Moreover, a cloud’s distributed nature increases the possibilities of being exploited by intruders. Since cloud services are available through the internet, the key issue to be looked upon is © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): FICC 2018, AISC 887, pp. 145–161, 2019. https://doi.org/10.1007/978-3-030-03405-4_10
146
I. S. Al-Mandhari et al.
cloud security [1]. The confidentiality, integrity, and availability of cloud services and resources are affected by attacks. The authors of [2] argued that the computing costs needed for making use of cryptographic techniques can not justify their use in preventing attacks on the cloud. A firewall is another solution that can be used to prevent attacks. However, it is argued that while firewalls work well for outsider attacks, they cannot prevent the occurrence of insider attacks. Because of this, it is recommended to incorporate Intrusion Detection Systems (IDSs) where the internal and external attacks can
Data Loading...