Divide-and-conquer ensemble self-training method based on probability difference

PDF / 2,647,692 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
96 Downloads / 212 Views

ORIGINAL RESEARCH

Divide‑and‑conquer ensemble self‑training method based on probability difference Tingting Li1,2 · Jia Lu1,2 Received: 14 October 2019 / Accepted: 6 April 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Self-training method can train an effective classifier by exploiting labeled instances and unlabeled instances. In the process of self-training method, the high confidence instances are usually selected iteratively and added to the training set for learning. Unfortunately, the structure information of high confidence instances is so similar that it leads to local over-fitting during the iterations. In order to avoid the over-fitting phenomenon, and improve the classification effect of self-training methods, a novel divide-and-conquer ensemble self-training framework based on probability difference is proposed. Firstly, the probability difference of instances is calculated by the category probability of each classifier, the low-fuzzy and high-fuzzy instances of each classifier are divided through the probability difference. Then, a divide-and-conquer strategy is adopted. That is, the low-fuzzy instances determined by all the classifiers are directly labeled and high-fuzzy instances are manually labeled. Finally, the labeled instances are added to the training set for iteration self-training. This method expands the training set by selecting low-fuzzy instances with accurate structure information and high-fuzzy instances with more comprehensive structure information, and it improves the generalization performance of the method effectively. The method is more suitable for noise data sets and it can obtain structure information even in a few labeled instances. The effectiveness of the proposed method is verified by comparative experiments on the University of California Irvine (UCI). Keywords Ensemble self-training · Probability difference · Low-fuzzy instances · High-fuzzy instances · Divide-andconquer strategy

1 Introduction In the real world, it is very difficult to obtain sufficient labeled instances, while lots of unlabeled instances are easy to obtain. Therefore, a semi-supervised learning(SSL) (Chen and Yan 2014; Rasmus et al. 2015) method that uses a small number of labeled instances and a large number of unlabeled instances to train a classifier and is between supervised

* Jia Lu jia‑[email protected] Tingting Li [email protected] 1

College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, People’s Republic of China

Chongqing Center of Engineering Technology Research on Digital Agricultural Service, Chongqing 401331, People’s Republic of China

2

learning (Sadhasivam and Kalivaradhan 2019) and unsupervised learning (Zhang et al. 2017) has emerged. There are many methods for SSL, such as self-training methods, co-training methods, graph-based methods and generating model (Li et al. 2019a; Li and Zhu 2019). Compared with other methods of SSL, one of the most widely used methods is self-training method. It has been succe

Data Loading...

Divide-and-conquer ensemble self-training method based on probability difference

Recommend Documents

A Network Attack Recognition Method Based on Probability Target Graph

Probability integral transformation method

Finite difference probabilistic slope stability analysis based on collocation-based stochastic response surface method (

A novel estimation method for failure-probability-based-sensitivity by conditional probability theorem

A Novel Method on Probability Evaluation of ZC Handover Scenario Based on SMC

A snapshot neural ensemble method for cancer-type prediction based on copy number variations

A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

A Method for Integrating Interfaces Based on Cluster Ensemble in Digital Library Federation

Ensemble Feature Selection Method Based on Bio-inspired Algorithms for Multi-objective Classification Problem

Finite-Difference Time-Domain Algorithm for Dispersive Media Based on Runge-Kutta Exponential Time Differencing Method

Combination of finite difference method and meshless method based on radial basis functions to solve fractional stochast

An image inpainting method for object removal based on difference degree constraint