A Semi-Supervised Network Traffic Classification Method Based on Incremental Learning
In order to solve low accuracy, time consumption and limited application range in traditional network traffic classification, a semi-supervised network traffic classification method based on incremental learning is proposed. During training Support Vector
- PDF / 1,990,003 Bytes
- 10 Pages / 439.37 x 666.142 pts Page_size
- 34 Downloads / 179 Views
A Semi-Supervised Network Traffic Classification Method Based on Incremental Learning Pinghong Li, Yong Wang and Xiaoling Tao
Abstract In order to solve low accuracy, time consumption and limited application range in traditional network traffic classification, a semi-supervised network traffic classification method based on incremental learning is proposed. During training Support Vector Machine (SVM), it takes full advantage of a large number of unlabeled samples and a small amount of labeled samples to modify the classifiers. By utilizing incremental learning technology to void unnecessary repetition training, improve the situation of original classifiers’ low accuracy and timeconsuming when new samples are added. Combined with the Synergies of multiple classifiers, this paper proposes an improved Tri-training method to train multiple classifiers, overcoming the strict limitation of traditional Co-verification for classification methods and sample types. Experiments’ results show that the proposed algorithm has excellent accuracy and speed in traffic classification.
Keywords Traffic classification Support vector machine Incremental learning Tri-training
Semi-supervised
100.1 Introduction Network traffic is an important carrier of recording, reflecting the network status and user activities, it plays an increasingly important role in effective network management. Network traffic classification [1] classifies the two-way TCP or UDP
P. Li (&) Y. Wang X. Tao Guilin University of Electronic Technology, NO.1 Jin-ji Road, Qixing District, Guilin, Guangxi, China e-mail: [email protected]
W. Lu et al. (eds.), Proceedings of the 2012 International Conference on Information Technology and Software Engineering, Lecture Notes in Electrical Engineering 211, DOI: 10.1007/978-3-642-34522-7_100, Springer-Verlag Berlin Heidelberg 2013
955
956
P. Li et al.
stream generated by network communication according to the types of network applications (such as WWW, FTP, MAIL, P2P) in the Internet based on TCP/IP protocol. Recently, applying machine learning method to classify and identify network applications is a research hotspot. There are two traditional strategies in machine learning [2], that’s supervised learning and unsupervised learning. Supervised learning methods, such as Bayesian methods, Decision tree methods, are high detection rates, but require that the sample data is correctly marked in advance and they are unable to find the unknown category samples. Unsupervised learning methods, such as Clustering method, group samples according to the data similarity. They don’t need labeled data, but only model unlabeled data, detection accuracy is low. Semi-supervised learning can take full advantage of a large number of unlabeled samples and a small amount of labeled samples. It makes up for the shortage of supervised learning and unsupervised learning. In this paper, a novel Least Area-SVM (LA-SVM) traffic classification algorithm is proposed, and we use improved Tri-training method to train classifie
Data Loading...