Study on Statistical Outlier Detection and Labelling

PDF / 3,943,502 Bytes
24 Pages / 595.26 x 841.82 pts (A4) Page_size
113 Downloads / 379 Views

on Statistical Outlier Detection and Labelling Paweł D. Domański Institute of Control and Computation Engineering, Warsaw University of Technology, Warsaw 00-665, Poland

Abstract: Outliers accompany control engineers in their real life activity. Industrial reality is much richer than elementary linear, quadratic, Gaussian assumptions. Outliers appear due to various and varying, often unknown, reasons. They meet research interest in statistical and regression analysis and in data mining. There are a lot of interesting algorithms and approaches to outlier detection, labelling, filtering and finally interpretation. Unfortunately, their impact on control systems has not been found sufficient attention in research. Their influence is frequently unnoticed, ignored or not mentioned. This work focuses on the subject of outlier detection and labelling in the context of control system performance analysis. Selected statistical data-driven approaches are analyzed, as they can be easily implemented with limited a priori knowledge. The study consists of a simulation study followed by the analysis of real control data. Different generation mechanisms are simulated, like overlapping Gaussian processes, symmetric and asymmetric, artificially shifted points and fat-tailed distributions. Simulation observations are confronted with industrial control loops datasets. The work concludes with a practical procedure, which should help practitioners in dealing with outliers in control engineering temporal data. Keywords: Outlier detection, control loop quality, statistical analysis, robust estimation, heavy tails.

1 Introduction An outlier is a strange phenomenon. Varying perspectives may give different interpretations. Simple definitions proposed by Dixon[1] define outliers as values, dubious in the eyes of the researcher or by Weiner[2] as contaminants. One of the most popular definitions has been formulated by Hawkins[3] naming, an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism to be an outlier. Johnson and Wichern[4] define an outlier, as an observation in a data set which appears to be inconsistent with the remainder of that set of data. Barnett and Lewis[5] say that, an outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. As one can see there are various other names for the outliers, for instance anomalies, contaminants or fringeliers reflecting, unusual events which occur more often than seldom[2]. These strange phenomena may have disastrous effects on further data analysis, whatever it will be[6]. They may increase signal variance and reduce the power of statistical tests performed during analysis[7]. They destroy signal normality and introduce fat tails[8]. Finally, Rousseeuw and Leroy[9] point out that they significantly bias regression analysis. Following presented definitions, we may try to investigate their or

Data Loading...

Study on Statistical Outlier Detection and Labelling

Recommend Documents

Outlier Detection

Handcrafted Outlier Detection Revisited

Fair Outlier Detection

Outlier Detection, Spatial

Randomized outlier detection with trees

Correction to: Fair Outlier Detection

An Optimized Approach of Outlier Detection Algorithm for Outlier Attributes on Data Streams

ALRe: Outlier Detection for Guided Refinement

Statistical hierarchical clustering algorithm for outlier detection in evolving data streams

Abstraction-Based Outlier Detection for Image Data

Efficient distributed privacy-preserving collaborative outlier detection

Developments in Unsupervised Outlier Detection Research