ELOF: fast and memory-efficient anomaly detection algorithm in data streams

  • PDF / 1,180,580 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 73 Downloads / 318 Views

DOWNLOAD

REPORT


METHODOLOGIES AND APPLICATION

ELOF: fast and memory-efficient anomaly detection algorithm in data streams Yun Yang1 · Liang Chen2

· ChongJun Fan1

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Anomaly detection in multivariate data is an import research field. Many studies have been proposed aiming to develop the local outlier factor (LOF). However, the existing LOF-based models have two major problems: (1) need a large amount of memory to store data; (2) unsatisfactory detection results in high-dimensional data. To this end, we propose a new data streams anomaly detection algorithm extract local outlier factor (ELOF). To reduce data storage, we first design a memory window mechanism to limit the amount of data storage; then, we design a new sub-data extraction model to extract the sub-data of the original data information. Through these two designs, the amount of data storage can be effectively reduced. Moreover, the model framework is based on the density discriminant method, and it can be widely applied to different real scenarios without any prior information or assumptions of data distribution. The final comprehensive experimental results show that the ELOF model has a great improvement than many common models in terms of accuracy. Furthermore, the running time of ELOF algorithm is less than 1% of the original LOF algorithm under the same data set. These results indicate that the ELOF improved model consumes less memory in real-time data streams anomaly detection and works better in high-dimensional data streams detection. Keywords Anomaly detection · LOF · ELOF · Data stream

1 Introduction With the advancement of mechanization and intelligence, online detection of anomalies in various scenarios has been increasingly applied. For static data with a limited number of samples, many outlier detection algorithms have been proposed. However, the research on dynamic data streams is relatively rare, and if we use the traditional static data detection method, there is not sufficient memory to store the data (Sadik and Gruenwald 2014). This limitation makes data Communicated by V. Loia.

B

Liang Chen [email protected] Yun Yang [email protected] ChongJun Fan [email protected]

1

Business School, University of Shanghai for Science and Technology, Shanghai, China

2

East China Normal University, Shanghai, China

streams outlier detection very challenging, so data stream anomaly detection has become an important research field in the industrial field. Local outlier factor (LOF) is a well-known density-based outlier detection algorithm in static data (Breunig et al. 2000). It is widely used because it can effectively detect outliers on non-uniform density data (Salehi et al. 2016; Yan et al. 2017a, b). However, the LOF calculation process needs to store the values, distances, lrd values, etc., which result in its space complexity that is O(n 2 ). When the amount of input data is too large, it will contribute to the memory usage that is too large. To reduce the space complexit