A framework for imprecise robust one-class classification models

PDF / 522,952 Bytes
15 Pages / 595.276 x 790.866 pts Page_size
28 Downloads / 251 Views

ORIGINAL ARTICLE

A framework for imprecise robust one-class classification models Lev V. Utkin

Received: 26 May 2012 / Accepted: 23 November 2012 Springer-Verlag Berlin Heidelberg 2012

Abstract A framework for constructing robust one-class classification models is proposed in the paper. It is based on Walley’s imprecise extensions of contaminated models which produce a set of probability distributions of data points instead of a single empirical distribution. The minimax and minimin strategies are used to choose an optimal probability distribution from the set and to construct optimal separating functions. It is shown that an algorithm for computing optimal parameters is determined by extreme points of the probability set and is reduced to a finite number of standard SVM tasks with weighted data points. Important special cases of the models, including pari-mutuel, constant odd-ratio, contaminated models and Kolmogorov–Smirnov bounds are studied. Experimental results with synthetic and real data illustrate the proposed models. Keywords Machine learning Novelty detection Classification Minimax strategy Support vector machine Quadratic programming

1 Introduction A special important problem of the statistical machine learning is the classification problem which can be regarded as a task of classifying some objects into classes (group) in accordance with their properties or features. However, for many real-world problems, the task is not to

L. V. Utkin (&) Department of Industrial Control and Automation, St. Petersburg State Forest Technical University, Institutski per. 5, 194021 St. Petersburg, Russia e-mail: [email protected]

classify but to detect novel or abnormal instances [7, 8, 12, 30, 32]. Comprehensive and interesting reviews of the novelty detection approaches are provided by Markou and Singh [25], by Bartkowiak [1], by Khan and Madden [20], by Hodge and Austin [17]. Novelty detection is the identification of new or unknown data that a machine learning system is not aware of during training. In particular, it aims to detect anomalous observations [10, 11, 33]. It should be noted that a typical feature of novelty detection models is that only unlabeled samples are available. We make some assumptions on anomalies in order to distinguish between normal and anomalous future observations. One of the most common ways to define anomalies is by saying that anomalies are not concentrated [31]. The problem of statistical outlier detection is also closely related to that of novelty detection. The first way to solve the novelty detection problem is to estimate the real-valued density of the data and then threshold it at some value. It is pointed out by many authors (see, for instance, [12]) that this way is likely to fail for sparse high-dimensional data. A better way is to model the support of the (unknown) data distribution directly from data, that is, to estimate a binary-valued function f that is positive in a region where the density is high, and negative elsewhere. This leads to a single-class le

Data Loading...

A framework for imprecise robust one-class classification models

Recommend Documents

Robust Deep Transfer Models for Fruit and Vegetable Classification: A Step Towards a Sustainable Dietary

A Logical Framework for Spatial Mental Models

A Framework for Selecting Classification Models in the Intruder Detection System Using TOPSIS

Robust Epileptic Seizure Classification

A Generalized Cauchy Distribution Framework for Problems Requiring Robust Behavior

Robust Approach for Emotion Classification Using Gait

Linear Programming Models for Classification

Imprecise Localization

A statistical framework for breast tumor classification from ultrasonic images

A Pre-processing framework for spectral classification of hyperspectral images

Towards a Classification Framework for Approaches to Enterprise Architecture Analysis

A novel feature learning framework for high-dimensional data classification