Dynamic clustering method for imbalanced learning based on AdaBoost

PDF / 5,551,949 Bytes
23 Pages / 439.37 x 666.142 pts Page_size
46 Downloads / 272 Views

Dynamic clustering method for imbalanced learning based on AdaBoost Xiaoheng Deng1 · Yuebin Xu1 · Lingchi Chen1 · Weijian Zhong1 · Alireza Jolfaei2 · Xi Zheng3

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system. Keywords Imbalanced learning · Dynamic clustering · Under-sampling · AdaBoost · Biological cyber-physical system

* Xiaoheng Deng [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)

X. Deng et al.

1 Introduction The pervasive noisy data and imbalanced datasets bring severe challenges in many application, e.g., medical diagnosis, email foldering, text classification, online electronic transactions, etc [25, 36]. In these imbalanced datasets, the class of interest (positive or minority classes) is usually rare compared with other classes (negative or majority classes). For instance, compared with male drivers, the number of female drivers is much smaller, yet they are equally significant in assessing the causes of traffic accidents. Similarly, in the field of network security, most of the samples are the safe networks but more attention should be paid to the attacked networks [35]. Positive samples in the minority class may be falsely treated as outliers or noise, and thus reduce the classification accuracy. Class overlapping and small disjuncts are two typical challenges with class imbalance classification. Class overlapping causes normal data to submerge easily, making discriminative rules

Data Loading...

Dynamic clustering method for imbalanced learning based on AdaBoost

Recommend Documents

Imbalanced Data Classification Method Based on Clustering and Voting Mechanism

Personalized Learning Resource Recommendation Method Based on Dynamic Collaborative Filtering

Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets

An Imbalanced Learning Based Method for Esophageal Squamous Cell Carcinoma (ESCC) Distant Metastasis Predicting

Imbalanced Ensemble Learning for Enhanced Pulsar Identification

AdaBoost-KNN with Direct Optimization for Dynamic Emotion Recognition

Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic

AdaBoost

Dynamic Clustering Based Energy Optimization for IoT Network

Research on Clustering Identification Method Based on Path Sampling in Support Vector Clustering

DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction

A Spectral Clustering Algorithm Based on Hierarchical Method