An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that

PDF / 1,283,767 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
97 Downloads / 161 Views

An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified Daniel Ahfock1 · Geoﬀrey J. McLachlan1 Received: 14 January 2020 / Accepted: 27 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence in the case where the absence of class labels does not depend on the data, the expected error rate of a classifier formed from the classified and unclassified features in a partially classified sample is greater than that if the sample were completely classified. We propose to treat the labels of the unclassified features as missing data and to introduce a framework for their missingness as in the pioneering work of Rubin (Biometrika 63:581–592, 1976) for missingness in incomplete data analysis. An examination of several partially classified data sets in the literature suggests that the unclassified features are not occurring at random in the feature space, but rather tend to be concentrated in regions of relatively high entropy. It suggests that the missingness of the labels of the features can be modelled by representing the conditional probability of a missing label for a feature via the logistic model with covariate depending on the entropy of the feature or an appropriate proxy for it. We consider here the case of two normal classes with a common covariance matrix where for computational convenience the square of the discriminant function is used as the covariate in the logistic model in place of the negative log entropy. Rather paradoxically, we show that the classifier so formed from the partially classified sample may have smaller expected error rate than that if the sample were completely classified. Keywords Normal discrimination · Semi-supervised learning · Model for missing-class labels · Relative efficiency of classifiers

1 Introduction We consider the problem of forming a classifier from training data that are not completely classified. That is, the feature vectors in the training sample have all been observed, but their This research was funded by the Australian Government through the Australian Research Council (Project Numbers DP170100907 and IC170100035). Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11222-020-09971-5) contains supplementary material, which is available to authorized users.

B

Geoffrey J. McLachlan [email protected] Daniel Ahfock [email protected]

1

School of Mathematics and Physics, University of Queensland, St. Lucia, QLD 4072, Australia

class labels are missing for some of them and so the training data constitute a partially classified sample. This problem goes back at least to the mid-se

Data Loading...

An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that

Recommend Documents

An Expected Error: An Essay in Defence of Moral Emotionism

Obesity and muscle may have synergic effect more than independent effects on brain volume in community-based elderly

A General Approach for Controlling the Type I Error Rate for Blinded Sample Size Recalculation

Why Is Litigation Density So Low in Japan? And What Are the Factors That May Have an Impact on It?

Patients with psoriatic arthritis have higher levels of FeNO than those with only psoriasis, which may reflect a higher

Expected Utility Hypotheses and the Allais Paradox Contemporary Disc

A new intelligent pattern classifier based on deep-thinking

A density-based maximum margin machine classifier

Influence of image analysis strategy, cooling rate, and sample volume on apparent protein cloud-point temperature determ

Pre-informed Consumers on a Pre-adjusted Menu Had Smaller Nitrogen Footprints During the N2013 Conference, Kampala, Than

Auto-Classifier: A Robust Defect Detector Based on an AutoML Head

A Framework for Designing a Fuzzy Rule-Based Classifier