An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-cl

PDF / 1,691,213 Bytes
16 Pages / 595.224 x 790.955 pts Page_size
34 Downloads / 199 Views

An eﬀective method using clustering-based adaptive decomposition and editing-based diversiﬁed oversamping for multi-class imbalanced datasets Xiangtao Chen1 · Lan Zhang1 · Xiaohui Wei1 · Xinguo Lu1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract For multi-class imbalanced classification tasks that occur in many real-world applications, the class imbalance, which is caused by the case that some classes are not as frequent as other classes, and class overlap, which is caused by the case that some classes contains a similar number of data, are the major challenges. Both of them make the classification task complicated. The decomposition-based strategy is an effective way to improve the performance of multi-class imbalanced classification tasks. However, current studies based on this strategy have failed to solve the problems of class imbalance and overlapping simultaneously. Therefore, we propose an effective method , namely clustering-based adaptive decomposition and editing-based diversified oversamping procedure(CluAD-EdiDO), to solve the above problems in this paper. The proposed CluAD-EdiDO consists of two key components: the clustering-based adaptive decomposition and the editing-based diversified oversampling technique. The former is applied to group similar data samples of the data set into clusters(i.e., “subproblems”). The latter is applied independently in different clusters to combat the imbalance and overlap, reducing the impact of the majority classes in overlapping region and oversampling the minority classes appropriately. Furthermore, a diversified ensemble learning framework is adopted to select the best classification algorithm for different sub-problems. Extensive experiments on 17 real-world datasets demonstrate that our method outperforms for multi-class imbalanced datasets. Keywords Multi-class imbalance · Overlapping problem · Clustering · Oversampling

1 Introduction The imbalance problem, also known as imbalance dataset, refers to a situation in which one or more of classes (minority classes) are underrepresented when compared

Xiangtao Chen

[email protected] Lan Zhang [email protected] Xiaohui Wei xh [email protected] Xinguo Lu [email protected] 1

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China

with the remaining ones (majority classes). This skewed distribution makes it difficult for many classification algorithms which assume that the provided training set is roughly balanced and each instance is equal important, such as Naive Bayesian [1], to get good performance. The class imbalance problem has been discussed in machine learning over the years, because it encountered in many real-world classification tasks, such as social media analysis [2], action recognition [3] and text classification [4]. However, Most practitioners focus on binary class imbalanced datasets. Obviously, the multi-class imbalance learning problem, where a dataset consists of samples from multiple different classes, is much more diff

Data Loading...

An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-cl

Recommend Documents

Background Subtraction using Adaptive Singular Value Decomposition

Decomposition Method

An effective method for solving nonlinear fractional differential equations

An Effective Face Color Comparison Method

An Effective Multiview Stereo Method for Uncalibrated Images

Cost Effective Method for Ransomware Detection: An Ensemble Approach

An Adaptive DFT-Based Channel Estimation Method for MIMO-OFDM

Subtraction Method for an Effective Quasi-monoenergetic Neutron Beam by Using Continuous Energy Spectra

Two-Grid Based Adaptive Proper Orthogonal Decomposition Method for Time Dependent Partial Differential Equations

Effective Image Restorations Using a Novel Spatial Adaptive Prior

An effective image self-recovery based fragile watermarking using self-adaptive weight-based compressed AMBTC

Domain Decomposition for the Closest Point Method