An Overview of Outliers and Detection Methods in General for Time Series from IoT Devices

As internet of things (IoT) devices are booming, a huge amount of data is sleeping without being used. At the same time, reliable and accurate time series analysis plays a key role in modern intelligent systems for achieving efficient management. One reas

  • PDF / 803,611 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 94 Downloads / 155 Views

DOWNLOAD

REPORT


Abstract. As internet of things (IoT) devices are booming, a huge amount of data is sleeping without being used. At the same time, reliable and accurate time series analysis plays a key role in modern intelligent systems for achieving efficient management. One reason why the data are not being used is that outliers are preventing many algorithms from working effectively. Manual data cleaning is taking the majority time before one solution could really work on data. Thus, data cleaning, especially fully automated outlier detection is the bottleneck which should be resolved as soon as possible. Previous work has investigated this topic but lacks study on overview from outlier and detection categorization aspects at the same time. This works aims to start covering this topic and to find a direction regarding how to make outlier detection and labelling more automated and general to be suitable for most time series data from IoT devices. Keywords: Survey things

 Anomaly novelty detection  Time series  Internet of

1 Introduction Time series analysis is widely used in intelligent transport, smart medical assistant, weather forecast, financial systems among other time-dynamic science and engineering topics. To achieve desired results, a lot of data are needed, which should be clean and in a good quality. However, one big problem is dirty data nowadays. Monitoring and getting data are important, but before any analysis starts, we need clean data. No matter what resources we have, it is nearly always necessary to clean and label outliers in the data. Outliers in raw data are preventing algorithms to achieve their best performance. In this paper, we try to get an overview of outliers and detection methods to see how to tackle the dirty data.

2 Categorization of Outliers For data in general, outliers are commonly categorized as three general types: point, contextual and collective [1, 2]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Q. Liu et al. (Eds.): CENet 2020, AISC 1274, pp. 1180–1186, 2021. https://doi.org/10.1007/978-981-15-8462-6_135

An Overview of Outliers and Detection Methods

1181

Point outliers are often used while analysing multi-dimensional data which are shown in Fig. 1a [3] which are also known as global outliers due to the fact that the original point-based methods are not considering the local context. Global outliers are usually detected by applying some kind of threshold.

(a)

(b)

(d)

(c)

(e)

Fig. 1. (a) Point (global) outliers are marked as triangles [3]. (b) A collective outlier in electrocardiographic signal [5]. (c) Two collective outliers due to football matches [6]. (d) An additive outlier is marked as A while a consecutive outlier is marked as B. (e) A long-time range consecutive outlier in time series.

In contract, contextual outliers are useful when observation’s context matters. For example, 80 °C is a global outlier and 30 °C is a contextual outlier in Nordic area but normal in India. Contextual outliers are al