Cost-sensitive sample shifting in feature space

  • PDF / 2,544,094 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 42 Downloads / 161 Views

DOWNLOAD

REPORT


THEORETICAL ADVANCES

Cost‑sensitive sample shifting in feature space Zhenchong Zhao1 · Xiaodan Wang1 · Chongming Wu2 · Lei Lei1 Received: 6 August 2019 / Accepted: 8 June 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The asymmetry of different misclassification costs is a common problem in many realistic applications. As one of the most familiar preprocessing methods, cost-sensitive resampling has drawn great attention due to its easy-implemented and universal properties. However, current methods mainly concentrate on changing the amount of the training set, which will alter the original distribution shapes and lead to the classifiers be over-fitted or unstable. For this case, a new method named costsensitive kernel shifting is proposed. The training data are remapped from the input space to the feature space by a particular kernel function, in which a distance metric is defined. Then the outliers are eliminated and the informative samples, including border and edge samples are selected due to the neighbor and geometrical information in the mapped space. Thirdly the positions of all the selected samples in the feature space are shifted. A moving step length is defined in proportion to both the ratio and different of the misclassification costs. In all steps only the kernel matrix is needed to be reshaped due to the kernel trick. Experiments on both synthetic and public datasets verify the effectiveness of the proposed methods. Keywords  Cost-sensitive learning · Resampling · Feature space · SVM

1 Introduction In real-world classification problems, data amounts of different classes usually have an asymmetric distribution, i.e., data for few classes are abundant while data for others are scarce. The skewed distribution draws the classification algorithms to focus greater attention on classes with abundant samples (majority) than the underrepresented ones (minority). This is the so-called ‘class-imbalance problem’, which is a common phenomenon in a wide variety of real applications and give great challenge to tradition learning methods [1, 2]. An inherent characteristic of imbalance learning is that the minority class is usually more important for users. Misclassifying a minority object may bring greater cost than misclassifying the majority ones. The field of data mining which deals with learning problems that have non-uniform costs is known as cost-sensitive learning [3], and is one of * Xiaodan Wang [email protected] 1



Air and Missile Defense College, Air force Engineering University, Xi’an 710051, Shaanxi, People’s Republic of China



College of Business, Xijing University, Xi’an, Shaanxi 710123, People’s Republic of China

2

the most effective approach for solving class-imbalance problems. Different from traditional methods, cost-sensitive algorithms utilize cost information to modify the training data distributions, or the classification boundaries, or the decision thresholds to minimize the total classification cost rather than total error [3, 4]. Many kinds of costs have b