Automatic optimization of outlier detection ensembles using a limited number of outlier examples
- PDF / 746,110 Bytes
- 18 Pages / 595.276 x 790.866 pts Page_size
- 29 Downloads / 196 Views
APPLICATIONS
Automatic optimization of outlier detection ensembles using a limited number of outlier examples Niko Reunanen1 · Tomi Räty2 · Timo Lintonen2 Received: 7 January 2020 / Accepted: 2 May 2020 © The Author(s) 2020
Abstract In data analysis, outliers are deviating and unexpected observations. Outlier detection is important, because outliers can contain critical and interesting information. We propose an approach for optimizing outlier detection ensembles using a limited number of outlier examples. In our work, a limited number of outlier examples are defined as from 1 to 10% of the available outliers. The optimized outlier detection ensembles consist of outlier detection algorithms, which provide an outlier score and utilize adjustable parameters. The automatic optimization determines the parameter values, which enhance the discrimination of inliers and outliers. This increases the efficiency of the outlier detection. Outliers are rare by definition, which makes the optimization with a few examples beneficial. Obtaining examples of outliers can be prohibitively challenging, and the outlier examples should be used efficiently. Keywords Bagging · Outlier detection · Outlier detection ensemble · Semi-supervised outlier detection
1 Introduction Outlier detection is an important form of data analysis [16]. An outlier is an unexpected data observation that does not match the existing data or assumptions of how the observations are generated [31]. Outliers deviate significantly from the expectations [29]. Normal and expected data observations are called inliers. An outlier can entail interesting information. It consists of unusual, unexpected and new information in comparison with inliers [14]. Other names for outliers include fault [22], intrusion [25,85] and anomaly [48]. Outlier detection has been successfully applied in different fields [8,15,24–26,28,36,38,62,68,80,81]. We propose an approach for optimizing outlier detection ensembles by automatically adjusting the parameters of the combined outlier detection algorithms using a limited num-
B
Timo Lintonen [email protected] Niko Reunanen [email protected] Tomi Räty [email protected]
1
Hellon Oy, Pursimiehenkatu 26 C, 00150 Helsinki, Finland
2
VTT Technical Research Centre of Finland, Kaitoväylä 1, 90571 Oulu, Finland
ber of outlier examples. The outlier detection algorithms are called detectors [43]. An outlier detection ensemble is a combination of detectors; see Sect. 2.1 and [2,3,89] for more information. In the context of our work, a limited number of outlier examples range from a single example to 10% of the available outliers for experiments. The optimization improves the efficiency of the outlier detection, which is empirically validated in Sect. 4.3. The optimization method is introduced in detail in Sect. 3. Section 2 defines the outlier detection algorithms and outlier detection ensembles in detail. Section 5 surveys the related work. Section 6 discusses about the acquired results and concludes this article. The optimization is s
Data Loading...