usfAD : a robust anomaly detector based on unsupervised stochastic forest

PDF / 3,176,094 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
97 Downloads / 238 Views

ORIGINAL ARTICLE

usfAD: a robust anomaly detector based on unsupervised stochastic forest Sunil Aryal1 · K.C. Santosh2 · Richard Dazeley1 Received: 11 April 2020 / Accepted: 16 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In real-world applications, data can be represented using different units/scales. For example, weight in kilograms or pounds and fuel-efficiency in km/l or l/100 km. One unit can be a linear or non-linear scaling of another. The variation in metrics due to the non-linear scaling makes Anomaly Detection (AD) challenging. Most existing AD algorithms rely on distanceor density-based functions, which makes them sensitive to how data is expressed. This means that they are representation dependent. To avoid such a problem, we introduce a new anomaly detection method, which we call ‘usfAD: Unsupervised Stochastic Forest-based Anomaly Detector’. Our empirical evaluation in synthetic and real-world cybersecurity (spam detection, malicious URL detection and intrusion detection) datasets shows that our approach is more robust to the variation in units/scales used to express data. It produces more consistent and better results than five state-of-the-art AD methods namely: local outlier factor; one-class support vector machine; isolation forest; nearest neighbor in a random subsample of data; and, simple histogram-based probabilistic method. Keywords Measurement scales and units · Anomaly detection · Outlier detection · Robust anomaly detection · Intrusion detection · Spam detection · And cyber security

1 Introduction

• Intrusion detection Detecting unauthorised access

1.1 Background

• Fraud detection Detecting fraudulent and suspicious

Anomalies (also sometimes referred to as outliers) are data instances that are significantly different from most of the other data causing suspicions that they were generating from a different mechanism from the one that is normal or expected [23]. Anomaly Detection (AD) is the task of detecting anomalies in a given dataset automatically using computers and algorithms [16]. It has many applications such as [1]:

• Spam detection Detecting malicious and phishing emails

* K.C. Santosh [email protected] Sunil Aryal [email protected] Richard Dazeley [email protected] 1

School of Information Technology, Deakin University, 75 Pigdons Rd, Waurn Ponds, VIC 3216, Australia

Department of Computer Science, University of South Dakota, 414 E Clark St, Vermillion, SD 57069, USA

2

requests and malicious activities in computer networks. credit card and other financial transactions in banking. in electronic communications.

Most existing anomaly detection algorithms [3, 4, 15, 26] assume that anomalies have feature values that are significantly different from those of normal instances. In other words, anomalies are few and different and they lie in low density regions.

2 Motivation In real-world applications, features of data objects can be measured in different units or recorded in different scales [5, 6, 19, 3

Data Loading...

usfAD : a robust anomaly detector based on unsupervised stochastic forest

Recommend Documents

History-Based Anomaly Detector: An Adversarial Approach to Anomaly Detection

Unsupervised Video Anomaly Detection Based on Sparse Reconstruction

VGG Based Unsupervised Anomaly Detection in Multivariate Time Series

SemiDroid: a behavioral malware detector based on unsupervised machine learning techniques using feature selection appro

Unsupervised Anomaly Detection with a GAN Augmented Autoencoder

A Novel Face Detector Based on YOLOv3

Robust and stochastic viability

Auto-Classifier: A Robust Defect Detector Based on an AutoML Head

Brazilian Forest Fire Analysis: An Unsupervised Approach

Robust tensor subspace learning for anomaly detection

A Hybrid and Improved Isolation Forest Algorithm for Anomaly Detection

Anomaly Detection for Data Streams Based on Isolation Forest Using Scikit-Multiflow