UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams

  • PDF / 1,429,763 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 68 Downloads / 237 Views

DOWNLOAD

REPORT


UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams Saihua Cai 1 & Li Li 1 & Qian Li 1 & Sicong Li 1 & Shangbo Hao 2 & Ruizhi Sun 1,3 Received: 14 October 2019 / Revised: 26 February 2020 / Accepted: 8 April 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract In this paper, we propose an efficient frequent-pattern-based outlier detection method, namely, UWFP-Outlier, for identifying the implicit outliers from uncertain weighted data streams. For reducing the time cost of the UWFP-Outlier method, in the weighted frequent pattern mining phase, we introduce the concepts of the maximal weight and maximal probability to form a compact antimonotonic property, thereby reducing the scale of potential extensible patterns. For accurately detecting the outliers, in the outlier detection phase, we design two deviation indices to measure the deviation degree of each transaction in the uncertain weighted data streams by considering more factors that may influence its deviation degree; then, the transactions which have large deviation degrees are judged as outliers. The experimental results indicate that the proposed UWFP-Outlier method can accurately detect the outliers from uncertain weighted data streams with a lower time cost. Keywords Outlier detection . Weighted frequent pattern mining . Deviation indices . Uncertain weighted data streams

1 Introduction In recent years, data have played an increasingly important role in daily life. It could provide the service for the production and life via some processing technologies, such as: feature

* Ruizhi Sun [email protected] Saihua Cai [email protected] Li Li [email protected] Qian Li [email protected] Sicong Li [email protected] Shangbo Hao [email protected] 1

College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China

2

Inner Mongolia Power Group Mengdian Information & Telecommunication Co., Ltd, Hohhot, China

3

Scientific research base for Integrated Technologies of Precision Agriculture (animal husbandry), The Ministry of Agriculture, Beijing 100083, China

selection [1, 2], information retrieval [3], clustering [4, 5], classification [6, 7], and forecasting [8, 9]. Therefore, the secure and credible data is very important because they can decide the accuracy of the provided service. However, the abnormal data (also called outliers) often exist in the collected data, while the existence of outliers will seriously influence the quality of the collected data, thereby resulting in an immeasurable impact for the data-based services. Compared with normal data, outliers [10] refer to the data generated by different mechanism or damaged equipment, thus, outliers have two distinct characteristics: (1) appearing not frequently, and (2) significantly differing from most data instances. For effectively discovering the outliers from the large scale of data instances, numerous outlier detection methods have been proposed in recent years, an

Data Loading...