A New Algorithm for High-Dimensional Outlier Detection Based on Constrained Particle Swarm Intelligence

In this paper we present an algorithm for outlier detection in high-dimensional spaces based on constrained particle swarm optimization techniques. The concept of outliers is defined as sparsely populated patterns in lower dimensional subspaces. The searc

PDF / 320,591 Bytes
8 Pages / 430 x 660 pts Page_size
46 Downloads / 190 Views

DOWNLOAD

REPORT

Abstract. In this paper we present an algorithm for outlier detection in high-dimensional spaces based on constrained particle swarm optimization techniques. The concept of outliers is deﬁned as sparsely populated patterns in lower dimensional subspaces. The search for best abnormally sparse subspaces is done by an innovative use of particle swarm optimization methods with a speciﬁcally designed particle coding and conversion strategy as well as some dimensionality-preserving updating techniques. Experimental results show that the proposed algorithm is feasible and eﬀective for high-dimensional outlier detection problems.

1

Introduction

Outlier detection has now become a hot issue in the area of data mining with numerous applications. Most of the existing algorithms for outlier detection use concepts of proximity to deﬁne and detect outliers, such as the notion of distancebased or density-based outliers [1]-[6]. However, as explained in [7], these algorithms are not appropriate for high dimensional cases because the data are sparse and the notion of proximity fails to retain eﬀectiveness. Actually, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based deﬁnitions [6][7]. Given the fact that most applications as mentioned above are high dimensional problems, it is of signiﬁcance to reasonably deﬁne outliers in a high dimensional space and to design speciﬁc algorithms for their detection. Recent research results indicate that mining outliers in lower-dimensional projections is feasible for high dimensional data. On one hand, it is in practice not qualitatively meaningful to detect outliers in full dimensional space. On the other hand, only the subsets of the attributes are aﬀected in some applications, such as credit card fraud, and so on. Along this line, Aggarwal and Yu [7] presented a new way of deﬁning outliers in a high dimensional space as well as a new technique for their detection. The point is to observe the density distributions of projections from the data that could be measured by the so called sparsity coeﬃcients and a data point is considered an outlier, if it is located in some abnormally low density subspace. Hence, the outlier detection in this context boils down to ﬁnding those combinations of dimensions with most abnormally sparse data. This turns out to G. Wang et al. (Eds.): RSKT 2008, LNAI 5009, pp. 516–523, 2008. c Springer-Verlag Berlin Heidelberg 2008

A New Algorithm for High-Dimensional Outlier Detection

517

be a very diﬃcult combinatorial optimization problem since the combinations of dimensions exponentially increase with increasing dimensionality and it is hard to examine all possible subsets of dimensions. Actually, a brute-force algorithm will become computationally untenable even for a problem with moderate dimensionality. To solve this problem, Aggarwal and Yu [7] developed an eﬀective evolutionary computation based algorithm, denoted hereafter as Geno , by making an innovative use of genetic al

Data Loading...

A New Algorithm for High-Dimensional Outlier Detection Based on Constrained Particle Swarm Intelligence

Recommend Documents

An Improved Particle Swarm Algorithm Based on Cultural Algorithm for Constrained Optimization

Particle Swarm Optimization Based Swarm Intelligence for Active Learning Improvement: Application on Medical Data Classi

A Novel Algorithm for Text Classification Based on KNN and Chaotic Binary Particle Swarm Optimization

An Optimized Approach of Outlier Detection Algorithm for Outlier Attributes on Data Streams

Abstraction-Based Outlier Detection for Image Data

An Improved Discrete Particle Swarm Optimization Algorithm

Hybridizing salp swarm algorithm with particle swarm optimization algorithm for recent optimization functions

A Performance Class-Based Particle Swarm Optimizer

An enhanced class topper algorithm based on particle swarm optimizer for global optimization

A Multi-AGV Optimal Scheduling Algorithm Based on Particle Swarm Optimization

A Novel Multiple Sequence Alignment Algorithm Based on Artificial Bee Colony and Particle Swarm Optimization

Multilevel Thresholding Method for Image Segmentation Based on an Adaptive Particle Swarm Optimization Algorithm