Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors
- PDF / 1,895,391 Bytes
- 27 Pages / 439.37 x 666.142 pts Page_size
- 75 Downloads / 161 Views
(2021) 29:8
Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors Ali Safari Khatouni1 · Nabil Seddigh2 · Biswajit Nandy2 · Nur Zincir‑Heywood3 Received: 11 March 2020 / Revised: 19 September 2020 / Accepted: 25 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Visibility into network traffic is a key requirement for different security and network monitoring tools. Recent trends in the evolution of Internet traffic present a challenge for traditional traffic analysis methods to achieve accurate classification of Internet traffic including Voice over IP (VoIP), text messaging, video, and audio services among others. A key aspect of this trend is the rising levels of encrypted multiple service channels where the payload is opaque to middleboxes in the network. In such scenarios, traditional approaches such as Deep Packet Inspection (DPI) or examination of Port numbers are unable to achieve the classification accuracy required. This work investigates Machine Learning-based network traffic classifiers as a means of accurately classifying encrypted multiple service channels. The study carries out a thorough study which (i) proposes and evaluates two machine learning-based frameworks for multiple service channels analysis; (ii) undertakes feature engineering to identify the minimum number of features required to obtain high accuracy while reducing the effects of over-fitting; (iii) explores the portability and robustness of the frameworks trained models under different network conditions: location, time, and volume; and (iv) collects and analyzes a large-scale dataset including nine classes of services, for benchmarking purposes. Keywords Multiple service channels · Encrypted traffic classification · Encrypted traffic analysis · Feature selection · Robust traffic classifier · Machine Learning based traffic analysis
* Ali Safari Khatouni [email protected] 1
Western University, London, Canada
2
Solana Networks, Ottawa, Canada
3
Dalhousie University, Halifax, Canada
13
Vol.:(0123456789)
8
Page 2 of 27
Journal of Network and Systems Management
(2021) 29:8
1 Introduction There exists a large body of prior research studies in the field of network traffic monitoring, analysis, and classification. Despite this prior research, the rapidly evolving nature of applications traffic behavior has resulted in a situation where the accurate classification of encrypted Voice over IP (VoIP), audio, and video traffic remains a challenge that requires further research. For example, VoIP applications encrypt the packet payload and implement different methods to bypass firewalls or proxies [1, 2]. Moreover, recent studies [3–5] indicate the rapid rise in application traffic encryption using Secure Hypertext Transfer Protocol (HTTPS) [6]. The classification task becomes much more challenging due to the convergence of web-based services. HTTP and HTTPS no longer carry just web pages but have become multiple service channels carry
Data Loading...