Study on Statistical Outlier Detection and Labelling
- PDF / 3,943,502 Bytes
- 24 Pages / 595.26 x 841.82 pts (A4) Page_size
- 113 Downloads / 248 Views
on Statistical Outlier Detection and Labelling Paweł D. Domański Institute of Control and Computation Engineering, Warsaw University of Technology, Warsaw 00-665, Poland
Abstract: Outliers accompany control engineers in their real life activity. Industrial reality is much richer than elementary linear, quadratic, Gaussian assumptions. Outliers appear due to various and varying, often unknown, reasons. They meet research interest in statistical and regression analysis and in data mining. There are a lot of interesting algorithms and approaches to outlier detection, labelling, filtering and finally interpretation. Unfortunately, their impact on control systems has not been found sufficient attention in research. Their influence is frequently unnoticed, ignored or not mentioned. This work focuses on the subject of outlier detection and labelling in the context of control system performance analysis. Selected statistical data-driven approaches are analyzed, as they can be easily implemented with limited a priori knowledge. The study consists of a simulation study followed by the analysis of real control data. Different generation mechanisms are simulated, like overlapping Gaussian processes, symmetric and asymmetric, artificially shifted points and fat-tailed distributions. Simulation observations are confronted with industrial control loops datasets. The work concludes with a practical procedure, which should help practitioners in dealing with outliers in control engineering temporal data. Keywords: Outlier detection, control loop quality, statistical analysis, robust estimation, heavy tails.
1 Introduction An outlier is a strange phenomenon. Varying perspectives may give different interpretations. Simple definitions proposed by Dixon[1] define outliers as values, dubious in the eyes of the researcher or by Weiner[2] as contaminants. One of the most popular definitions has been formulated by Hawkins[3] naming, an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism to be an outlier. Johnson and Wichern[4] define an outlier, as an observation in a data set which appears to be inconsistent with the remainder of that set of data. Barnett and Lewis[5] say that, an outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. As one can see there are various other names for the outliers, for instance anomalies, contaminants or fringeliers reflecting, unusual events which occur more often than seldom[2]. These strange phenomena may have disastrous effects on further data analysis, whatever it will be[6]. They may increase signal variance and reduce the power of statistical tests performed during analysis[7]. They destroy signal normality and introduce fat tails[8]. Finally, Rousseeuw and Leroy[9] point out that they significantly bias regression analysis. Following presented definitions, we may try to investigate their or
Data Loading...