An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-cl
- PDF / 1,691,213 Bytes
- 16 Pages / 595.224 x 790.955 pts Page_size
- 34 Downloads / 180 Views
An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets Xiangtao Chen1 · Lan Zhang1 · Xiaohui Wei1 · Xinguo Lu1
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract For multi-class imbalanced classification tasks that occur in many real-world applications, the class imbalance, which is caused by the case that some classes are not as frequent as other classes, and class overlap, which is caused by the case that some classes contains a similar number of data, are the major challenges. Both of them make the classification task complicated. The decomposition-based strategy is an effective way to improve the performance of multi-class imbalanced classification tasks. However, current studies based on this strategy have failed to solve the problems of class imbalance and overlapping simultaneously. Therefore, we propose an effective method , namely clustering-based adaptive decomposition and editing-based diversified oversamping procedure(CluAD-EdiDO), to solve the above problems in this paper. The proposed CluAD-EdiDO consists of two key components: the clustering-based adaptive decomposition and the editing-based diversified oversampling technique. The former is applied to group similar data samples of the data set into clusters(i.e., “subproblems”). The latter is applied independently in different clusters to combat the imbalance and overlap, reducing the impact of the majority classes in overlapping region and oversampling the minority classes appropriately. Furthermore, a diversified ensemble learning framework is adopted to select the best classification algorithm for different sub-problems. Extensive experiments on 17 real-world datasets demonstrate that our method outperforms for multi-class imbalanced datasets. Keywords Multi-class imbalance · Overlapping problem · Clustering · Oversampling
1 Introduction The imbalance problem, also known as imbalance dataset, refers to a situation in which one or more of classes (minority classes) are underrepresented when compared
Xiangtao Chen
[email protected] Lan Zhang [email protected] Xiaohui Wei xh [email protected] Xinguo Lu [email protected] 1
College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
with the remaining ones (majority classes). This skewed distribution makes it difficult for many classification algorithms which assume that the provided training set is roughly balanced and each instance is equal important, such as Naive Bayesian [1], to get good performance. The class imbalance problem has been discussed in machine learning over the years, because it encountered in many real-world classification tasks, such as social media analysis [2], action recognition [3] and text classification [4]. However, Most practitioners focus on binary class imbalanced datasets. Obviously, the multi-class imbalance learning problem, where a dataset consists of samples from multiple different classes, is much more diff
Data Loading...