Data representation for CNN based internet traffic classification: a comparative study

  • PDF / 3,876,086 Bytes
  • 27 Pages / 439.642 x 666.49 pts Page_size
  • 54 Downloads / 168 Views

DOWNLOAD

REPORT


Data representation for CNN based internet traffic classification: a comparative study Ola Salman1

· Imad H. Elhajj1 · Ayman Kayssi1 · Ali Chehab1

Received: 1 January 2020 / Revised: 6 July 2020 / Accepted: 28 July 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract It has been well established that the Internet of Things will bring an expansion in traffic volume and types. This will bring new challenges in terms of Quality of Service (QoS) and security, requiring innovative traffic management techniques. Traffic classification is a main network function that helps in managing both QoS and security. Different machine learning based methods have been applied for this aim. However, traditional machine learning methods rely on hand crafted features, limiting the model ability to learn. Deep Learning (DL), a branch of machine learning, is characterized by its representation learning ability. In this paper, we analyse two methods of data representation for DL-based classification: a raw packet-based representation and a quasi-raw flow-based representation. Different tests are performed to evaluate the robustness of these data representation methods. The tests include features’ importance, model robustness, and anonymization tests. The results show that raw data representation suffers from traffic anonymization and the fact that many packet fields are data-dependent. On the other hand, the flow-based representation is sensitive to the number of packets used for classification and to traffic obfuscation. Keywords Deep learning · Internet of things · Traffic classification · Data representation

1 Introduction The Internet of Things includes a heterogeneous set of connected devices that run different types of applications. These devices and applications will generate different types of traffic,  Ola Salman

[email protected] Imad H. Elhajj [email protected] Ayman Kayssi [email protected] Ali Chehab [email protected] 1

American University of Beirut, Beirut 1107 2020, Lebanon

Multimedia Tools and Applications

having different requirements in terms of Quality of Service (QoS) and security. Managing both QoS and security in this high-scale network calls for innovative network management techniques [59]. In this context, traffic classification is considered as an essential element for traffic engineering, security management, traffic trends analysis, and so on [15]. The ability to classify the traffic based on the different requirements in terms of bandwidth, latency, throughput, etc., enables the allocation of the corresponding resources for each type of traffic and thus, guarantee good QoS [4, 5, 7, 24]. On the other hand, traffic classification techniques can be used to detect abnormal traffic [7, 39, 67]. Furthermore, Intrusion Detection Systems (IDSs) are using machine learning to reveal the attack name/type [28]. Internet traffic consists of the flow of data between the different communication parties. The Internet Protocol (IP) network traffic dominates other Internet traffic typ