Ensemble learning based on random super-reduct and resampling

  • PDF / 3,259,988 Bytes
  • 26 Pages / 439.37 x 666.142 pts Page_size
  • 42 Downloads / 234 Views

DOWNLOAD

REPORT


Ensemble learning based on random super‑reduct and resampling Feng Jiang1 · Xu Yu1 · Hongbo Zhao1 · Dunwei Gong2 · Junwei Du1 

© Springer Nature B.V. 2020

Abstract Ensemble learning has been widely used for improving the performance of base classifiers. Diversity among base classifiers is considered as a key issue in ensemble learning. Recently, to promote the diversity of base classifiers, ensemble methods through multimodal perturbation have been proposed. These methods simultaneously use two or more perturbation techniques when generating base classifiers. In this paper, from the perspective of multi-modal perturbation, we propose an ensemble approach (called ‘E_RSRR’) based on random super-reduct and resampling. To generate a set of accurate and diverse base classifiers, E _RSRR adopts a new multi-modal perturbation strategy. This strategy combines two perturbation techniques together, that is, resampling and random superreduct. First, it perturbs the sample space via the resampling technique; Second, it perturbs the feature space via the random super-reduct technique, which is a combination of RSS (random subspace selection) technique and ADEFS (approximate decision entropy-based feature selection) method in rough sets. Experimental results show that E _RSRR can provide competitive solutions for ensemble learning. Keywords  Ensemble learning · Rough sets · Random super-reduct · Approximate decision entropy · Resampling · Multi-modal perturbation

1 Introduction Ensemble learning has become a hot topic since the 1990s (Breiman 1996; Dietterich 2002; Li et al. 2018; Rokach 2010; Schapire 1990; Yu et al. 2018; Zhou 2012). A large number of studies show that an ensemble classifier can obtain higher classification accuracy than a base classifier (Feng and Zhou 2018). In recent years, ensemble learning has become an important research field of machine learning, and various ensemble methods have been proposed and applied to many real-world problems (Pietruczuk et al. 2017; Santos and De * Junwei Du [email protected] 1

College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, People’s Republic of China

2

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, People’s Republic of China



13

Vol.:(0123456789)



F. Jiang et al.

Barros 2019; Serafino et al. 2018; Tama and Rhee 2019). For instance, ensemble methods have been fruitfully exploited for handling imbalanced data sets (Ceci et  al. 2015; Galar et al. 2012; Pio et al. 2014). It is well known that an ensemble classifier tends to obtain better performance when there is a high diversity among the base classifiers and the accuracy of each base classifier is high. It is easy to measure the accuracy of a base classifier (also called individual classifier). However, it is not straightforward to measure the diversity among base classifiers, since there is no generally accepted formal definition for diversity (Kuncheva and Whitaker 2003). Although various measures of