Cost-sensitive sample shifting in feature space

PDF / 2,544,094 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
42 Downloads / 180 Views

THEORETICAL ADVANCES

Cost‑sensitive sample shifting in feature space Zhenchong Zhao1 · Xiaodan Wang1 · Chongming Wu2 · Lei Lei1 Received: 6 August 2019 / Accepted: 8 June 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The asymmetry of different misclassification costs is a common problem in many realistic applications. As one of the most familiar preprocessing methods, cost-sensitive resampling has drawn great attention due to its easy-implemented and universal properties. However, current methods mainly concentrate on changing the amount of the training set, which will alter the original distribution shapes and lead to the classifiers be over-fitted or unstable. For this case, a new method named costsensitive kernel shifting is proposed. The training data are remapped from the input space to the feature space by a particular kernel function, in which a distance metric is defined. Then the outliers are eliminated and the informative samples, including border and edge samples are selected due to the neighbor and geometrical information in the mapped space. Thirdly the positions of all the selected samples in the feature space are shifted. A moving step length is defined in proportion to both the ratio and different of the misclassification costs. In all steps only the kernel matrix is needed to be reshaped due to the kernel trick. Experiments on both synthetic and public datasets verify the effectiveness of the proposed methods. Keywords Cost-sensitive learning · Resampling · Feature space · SVM

1 Introduction In real-world classification problems, data amounts of different classes usually have an asymmetric distribution, i.e., data for few classes are abundant while data for others are scarce. The skewed distribution draws the classification algorithms to focus greater attention on classes with abundant samples (majority) than the underrepresented ones (minority). This is the so-called ‘class-imbalance problem’, which is a common phenomenon in a wide variety of real applications and give great challenge to tradition learning methods [1, 2]. An inherent characteristic of imbalance learning is that the minority class is usually more important for users. Misclassifying a minority object may bring greater cost than misclassifying the majority ones. The field of data mining which deals with learning problems that have non-uniform costs is known as cost-sensitive learning [3], and is one of * Xiaodan Wang [email protected] 1

Air and Missile Defense College, Air force Engineering University, Xi’an 710051, Shaanxi, People’s Republic of China

College of Business, Xijing University, Xi’an, Shaanxi 710123, People’s Republic of China

2

the most effective approach for solving class-imbalance problems. Different from traditional methods, cost-sensitive algorithms utilize cost information to modify the training data distributions, or the classification boundaries, or the decision thresholds to minimize the total classification cost rather than total error [3, 4]. Many kinds of costs have b

Data Loading...

Cost-sensitive sample shifting in feature space

Recommend Documents

Feature Space Augmentation for Long-Tailed Data

Quality Feature Space of Transmitted Video

Feature and Sample Size Selection for Malware Classification Process

A Feature-Based Detection System of Adversarial Sample Attack

Concerns of Organic Contamination for Sample Return Space Missions

Frequency Shifting

ROLE SHIFTING

Shifting Geometries

Shifting Perspectives

Fuzzy ELM for classification based on feature space

Correction to: Semi-Supervised Sentiment Analysis of Portuguese Tweets with Random Walk in Feature Sample Networks

Feature Space Based Loss for Face Photo-Sketch Synthesis