A comparative evaluation of novelty detection algorithms for discrete sequences

PDF / 2,476,566 Bytes
26 Pages / 439.37 x 666.142 pts Page_size
52 Downloads / 227 Views

A comparative evaluation of novelty detection algorithms for discrete sequences Rémi Domingues1

· Pietro Michiardi1 · Jérémie Barlet2 · Maurizio Filippone1

© Springer Nature B.V. 2019

Abstract The identification of anomalies in temporal data is a core component of numerous research areas such as intrusion detection, fault prevention, genomics and fraud detection. This article provides an experimental comparison of candidate methods for the novelty detection problem applied to discrete sequences. The objective of this study is to identify which state-of-the-art methods are efficient and appropriate candidates for a given use case. These recommendations rely on extensive novelty detection experiments based on a variety of public datasets in addition to novel industrial datasets. We also perform thorough scalability and memory usage tests resulting in new supplementary insights of the methods’ performance, key selection criteria to solve problems relying on large volumes of data and to meet the expectations of applications subject to strict response time constraints. Keywords Novelty detection · Discrete sequences · Temporal data · Fraud detection · Outlier detection · Anomaly detection

1 Introduction Novelty detection is an unsupervised learning problem and an active research area (Aggarwal 2015; Hodge and Austin 2004). Given a set of training samples, novelty detection is the task of classifying new test samples as nominal when the test data relates to the training set, or as anomalous when they significantly differ. Anomalous data is called novelties or anomalies and is assumed to be generated by a different generative process. Since novelty detection

B

Rémi Domingues [email protected] Pietro Michiardi [email protected] Jérémie Barlet [email protected] Maurizio Filippone [email protected]

1

Department of Data Science, EURECOM, 450 Route des Chappes, Sophia Antipolis, France

2

Amadeus, 485 Route du Pin Montard, Sophia Antipolis, France

123

R. Domingues et al.

can be considered a one-class classification problem, it has also been described as a semisupervised problem (Chandola et al. 2012) when the training set is exempt of outliers. While most anomaly detection problems deal with numerical data (Emmott et al. 2016; Breunig et al. 2000; Ramaswamy et al. 2000), novelty detection methods have been successfully applied to categorical data (Hodge and Austin 2004), time-series (Marchi et al. 2015; Kundzewicz and Robson 2004; Taylor and Letham 2018), discrete sequences (Chandola et al. 2008; Warrender et al. 1999; Cohen 1995) and mixed data types (Domingues et al. 2018). This paper surveys the problem of detecting anomalies in temporal data, specifically in discrete sequences of events which have a temporal order. Such a problem can be divided into two categories. The first one is change point detection. In this problem, the dataset is a single and often long sequence in which we seek anomalous subsequences composed of contiguous events. Anomalies denote a sudden change of beh

Data Loading...

A comparative evaluation of novelty detection algorithms for discrete sequences

Recommend Documents

Some New Approaches to Comparative Evaluation of Algorithms for Calculating Distances Between Genomic Sequences

A Comparative Study of Machine Learning Algorithms for Gas Leak Detection

Novelty Creation

Approximation Algorithms for Discrete Polynomial Optimization

Wireless Condition Monitoring and Embedded Novelty Detection

A comparative analysis of gradient boosting algorithms

Object Detection and Tracking Algorithms for Vehicle Counting: A Comparative Analysis

Novelty

A computational model for subjective evaluation of novelty in descriptive aptitude

A comparative study on bio-inspired algorithms for sentiment analysis

Efficient Algorithms for Model-Based Motif Discovery from Multiple Sequences

Algorithms of the Discrete Singularity Method for Computing Technologies