SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling

PDF / 3,355,828 Bytes
16 Pages / 595.224 x 790.955 pts Page_size
58 Downloads / 185 Views

SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling Hongjiao Guan1,2,3 · Yingtao Zhang3 · Min Xian4 · H. D. Cheng5 · Xianglong Tang3

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Many practical applications suffer from imbalanced data classification, in which case the minority class has degraded recognition rate. The primary causes are the sample scarcity of the minority class and the intrinsic complex distribution characteristics of imbalanced datasets. The imbalanced classification problem is more serious on small sample datasets. To solve the problems of small sample and class imbalance, a hybrid resampling method is proposed. The proposed method combines an oversampling approach (synthetic minority oversampling technique, SMOTE) and a novel data cleaning approach (weighted edited nearest neighbor rule, WENN). First, SMOTE generates synthetic minority class examples using linear interpolation. Then, WENN detects and deletes unsafe majority and minority class examples using weighted distance function and k-nearest neighbor (kNN) rule. The weighted distance function scales up a commonly used distance by considering local imbalance and spacial sparsity. Extensive experiments over synthetic and real datasets validate the superiority of the proposed SMOTE-WENN compared with three state-of-the-art resampling methods. Keywords Imbalanced data classification · Small sample datasets · Oversampling · Data cleaning

1 Introduction Imbalanced data classification is common in practical applications [1, 2], such as medical diagnosis, defect prediction, etc. It has always been a challenge in data mining and machine learning. For two-class datasets, imbalance means that the number of one class (called positive or minority class) is far less than that of the other class (called negative or majority class). Imbalanced datasets lead to Hongjiao Guan

[email protected] 1

School of Cyber Security, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

2

Shandong Computer Science Center (National Supercomputer Center in Jinan), Shandong Provincial Key Laboratory of Computer Networks, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China

3

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

4

Department of Computer Science, University of Idaho, Idaho Falls, USA

5

School of Computer Science, Utah State University, Logan UT, 84322, USA

the performance deterioration of traditional classification methods. Especially, the recognition rate of the minority class decreases seriously. However, the minority class is of our interest from an application point of view. Furthermore, the misclassification cost of the minority class is usually higher than that of the majority class. The imbalanced classification problem can be explained in two aspects. One is due to inappropriate optimization metrics in traditional learning algorithms. These algorithms as

Data Loading...

SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling

Recommend Documents

Design Thinking for Class Imbalance Problems Using Compound Techniques

Small and Large: Scaling

Impact of Class Imbalance on Convolutional Neural Network Training in Multi-class Problems

Variable parameter Uzawa method for solving a class of block three-by-three saddle point problems

Solving Hard Problems by Protein Folding?

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems

Solving planar location problems by global optimization

An iterative technique for solving a class of local and nonlocal elliptic boundary value problems

Methods of Solving Sequence and Series Problems

Postural control in paw distance after labyrinthectomy-induced vestibular imbalance

Nanoparticle shape reconstruction by solving the direct and inverse small-angle scattering problems for a unit potential

\(\mathsf {FABBOO}\) - Online Fairness-Aware Learning Under Class Imbalance