Uncertain distance-based outlier detection with arbitrarily shaped data objects

PDF / 1,384,800 Bytes
24 Pages / 439.642 x 666.49 pts Page_size
63 Downloads / 286 Views

Uncertain distance-based outlier detection with arbitrarily shaped data objects Fabrizio Angiulli1

· Fabio Fassetti1

Received: 7 February 2020 / Revised: 24 September 2020 / Accepted: 24 September 2020 / © The Author(s) 2020

Abstract Enabling information systems to face anomalies in the presence of uncertainty is a compelling and challenging task. In this work the problem of unsupervised outlier detection in large collections of data objects modeled by means of arbitrary multidimensional probability density functions is considered. We present a novel definition of uncertain distance-based outlier under the attribute level uncertainty model, according to which an uncertain object is an object that always exists but its actual value is modeled by a multivariate pdf. According to this definition an uncertain object is declared to be an outlier on the basis of the expected number of its neighbors in the dataset. To the best of our knowledge this is the first work that considers the unsupervised outlier detection problem on data objects modeled by means of arbitrarily shaped multidimensional distribution functions. We present the UDBOD algorithm which efficiently detects the outliers in an input uncertain dataset by taking advantages of three optimized phases, that are parameter estimation, candidate selection, and the candidate filtering. An experimental campaign is presented, including a sensitivity analysis, a study of the effectiveness of the technique, a comparison with related algorithms, also in presence of high dimensional data, and a discussion about the behavior of our technique in real case scenarios. Keywords Nearest neighbors · Outlier detection · Uncertain data · Unsupervised learning

1 Introduction Traditional data analysis techniques deal with feature vectors having deterministic values. Thus, data uncertainty is usually ignored in the problem formulation. However, uncertainty

A preliminary version of this work appears in Angiulli and Fassetti (2013). Fabrizio Angiulli

[email protected] Fabio Fassetti [email protected] 1

DIMES, University of Calabria, 87036, Rende, CS, Italy

Journal of Intelligent Information Systems

arises in real data in many ways, since the data may contain errors or may be only partially complete (Lindley 2006). The uncertainty may result from the limitations of the equipment, indeed physical devices are often imprecise due to measurement errors. Another source of uncertainty are repeated measurements, e.g. sea surface temperature could be recorded multiple times during a day. Also, in some applications data values are continuously changing, as positions of devices or observations associated with natural phenomena, and these quantities can be represented by using an uncertain model. Simply disregarding uncertainty may led to less accurate conclusions or even inexact ones. This has created a need for uncertain data management techniques (Aggarwal and Yu 2009) managing data records typically represented by probability distributions (Mohri 2003; Kriegel and

Data Loading...

Uncertain distance-based outlier detection with arbitrarily shaped data objects

Recommend Documents

UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams

Randomized outlier detection with trees

Abstraction-Based Outlier Detection for Image Data

Outlier Detection

Data Types for Uncertain, Indeterminate, or Imprecise Spatial Objects

Handcrafted Outlier Detection Revisited

Fair Outlier Detection

Outlier Detection, Spatial

Lightweight Classifier-Based Outlier Detection Algorithms from Multivariate Data Stream

Outlier Detection for Data Using Density-Based Technique

Outlier Detection: Techniques and Applications A Data Mining Per

An Optimized Approach of Outlier Detection Algorithm for Outlier Attributes on Data Streams