Learning from Others: User Anomaly Detection Using Anomalous Samples from Other Users
Machine learning is increasingly used as a key technique in solving many security problems such as botnet detection, transactional fraud, insider threat, etc. One of the key challenges to the widespread application of ML in security is the lack of labeled
- PDF / 972,876 Bytes
- 19 Pages / 439.37 x 666.142 pts Page_size
- 5 Downloads / 207 Views
IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA {young park,molloyim,schari}@us.ibm.com 2 Purdue University, West Lafayette, IN, USA {xu218,gates,ninghui}@cs.purdue.edu
Abstract. Machine learning is increasingly used as a key technique in solving many security problems such as botnet detection, transactional fraud, insider threat, etc. One of the key challenges to the widespread application of ML in security is the lack of labeled samples from real applications. For known or common attacks, labeled samples are available, and, therefore, supervised techniques such as multi-class classification can be used. However, in many security applications, it is difficult to obtain labeled samples as each attack can be unique. In order to detect novel, unseen attacks, researchers used unsupervised outlier detection or one-class classification approaches, where they treat existing samples as benign samples. These methods, however, yield high false positive rates, preventing their adoption in real applications. This paper presents a local outlier factor (LOF)-based method to automatically generate both benign and malicious training samples from unlabeled data. Our method is designed for applications with multiple users such as insider threat, fraud detection, and social network analysis. For each target user, we compute LOF scores of all samples with respect to the target user’s samples. This allows us to identify (1) other users’ samples that lie in the boundary regions and (2) outliers from the target user’s samples that can distort the decision boundary. We use the samples from other users as malicious samples, and use the target user’s samples as benign samples after removing the outliers. We validate the effectiveness of our method using several datasets including access logs for valuable corporate resources, DBLP paper titles, and behavioral biometrics of user typing behavior. The evaluation of our method on these datasets confirms that, in almost all cases, our technique performs significantly better than both one-class classification methods and prior two-class classification methods. Further, our method is a general technique that can be used for many security applications.
1
Introduction
Driven by an almost endless stream of well publicized cases of information theft by malicious insiders, such as Wikileaks and Snowden, there is increased interest c Springer International Publishing Switzerland 2015 G. Pernul et al. (Eds.): ESORICS 2015, Part II, LNCS 9327, pp. 396–414, 2015. DOI: 10.1007/978-3-319-24177-7 20
Learning from Others: User Anomaly Detection Using Anomalous
397
for monitoring systems to detect anomalous user behavior. Today, in addition to traditional access control and other security controls, organizations actively deploy activity monitoring mechanisms to detect such attacks. Activity monitoring is done through enforced rules as well as anomaly detection using ML techniques. To best apply ML techniques, it is ideal if we can train a model with lots of both anomalous and benign samples. T
Data Loading...