Divide-and-conquer ensemble self-training method based on probability difference
- PDF / 2,647,692 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 96 Downloads / 185 Views
ORIGINAL RESEARCH
Divide‑and‑conquer ensemble self‑training method based on probability difference Tingting Li1,2 · Jia Lu1,2 Received: 14 October 2019 / Accepted: 6 April 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Self-training method can train an effective classifier by exploiting labeled instances and unlabeled instances. In the process of self-training method, the high confidence instances are usually selected iteratively and added to the training set for learning. Unfortunately, the structure information of high confidence instances is so similar that it leads to local over-fitting during the iterations. In order to avoid the over-fitting phenomenon, and improve the classification effect of self-training methods, a novel divide-and-conquer ensemble self-training framework based on probability difference is proposed. Firstly, the probability difference of instances is calculated by the category probability of each classifier, the low-fuzzy and high-fuzzy instances of each classifier are divided through the probability difference. Then, a divide-and-conquer strategy is adopted. That is, the low-fuzzy instances determined by all the classifiers are directly labeled and high-fuzzy instances are manually labeled. Finally, the labeled instances are added to the training set for iteration self-training. This method expands the training set by selecting low-fuzzy instances with accurate structure information and high-fuzzy instances with more comprehensive structure information, and it improves the generalization performance of the method effectively. The method is more suitable for noise data sets and it can obtain structure information even in a few labeled instances. The effectiveness of the proposed method is verified by comparative experiments on the University of California Irvine (UCI). Keywords Ensemble self-training · Probability difference · Low-fuzzy instances · High-fuzzy instances · Divide-andconquer strategy
1 Introduction In the real world, it is very difficult to obtain sufficient labeled instances, while lots of unlabeled instances are easy to obtain. Therefore, a semi-supervised learning(SSL) (Chen and Yan 2014; Rasmus et al. 2015) method that uses a small number of labeled instances and a large number of unlabeled instances to train a classifier and is between supervised
* Jia Lu jia‑[email protected] Tingting Li [email protected] 1
College of Computer and Information Sciences, Chongqing Normal University, Chongqing 401331, People’s Republic of China
Chongqing Center of Engineering Technology Research on Digital Agricultural Service, Chongqing 401331, People’s Republic of China
2
learning (Sadhasivam and Kalivaradhan 2019) and unsupervised learning (Zhang et al. 2017) has emerged. There are many methods for SSL, such as self-training methods, co-training methods, graph-based methods and generating model (Li et al. 2019a; Li and Zhu 2019). Compared with other methods of SSL, one of the most widely used methods is self-training method. It has been succe
Data Loading...