A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

PDF / 3,312,113 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
94 Downloads / 230 Views

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor Junnan Li 1 & Qingsheng Zhu 1

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification. The mislabeling is the most challenging issue in self-training methods and the ensemble learning is one of the common techniques for dealing with the mislabeling. Specifically, the ensemble learning can solve or alleviate the mislabeling by constructing an ensemble classifier to improve prediction accuracy in the self-training process. However, most ensemble learning methods may not perform well in self-training methods because it is difficult for ensemble learning methods to train an effective ensemble classifier with a small number of labeled data. Inspired by the successful boosting methods, we introduce a new boosting selftraining framework based on instance generation with natural neighbors (BoostSTIG) in this paper. BoostSTIG is compatible with most boosting methods and self-training methods. It can use most boosting methods to solve or alleviate the mislabeling of existing self-training methods by improving the prediction accuracy in the self-training process. Besides, an instance generation with natural neighbors is proposed to enlarge initial labeled data in BoostSTIG, which makes boosting methods more suitable for self-training methods. In experiments, we apply the BoostSTIG framework to 2 self-training methods and 4 boosting methods, and then validate BoostSTIG by comparing some state-of-the-art technologies on real data sets. Intensive experiments show that BoostSTIG can improve the performance of tested self-training methods and train an effective k nearest neighbor. Keywords Semi-supervised learning (SSL) . Semi-supervised classification (SSC) . Self-training . Boosting . Instance generation . Natural neighbors

1 Introduction Classification [1] has attracted great attention from scholars in machine learning and pattern recognition. Because of its importance and great values, it has been applied in text classification, biological medical treatment, spam classification, risk management, digital image, etc., [2–6]. In traditional classification tasks, an effective prediction model is trained on sufficient labeled data. Unfortunately, it is not easy to obtain a large number of labeled samples due to high labor costs and huge time consumption. This was the main motivation that led to the inception of the semi-supervised classification (SSC) [7, 8]. SSC can use both labeled and unlabeled data to train a prediction model and complete classification tasks. Two main

* Qingsheng Zhu [email protected] 1

Chongqing Key Laboratory of Software Theory & Technology, College of Computer Science, Chongqing, China

objectives of SSC are transductive and inductive classification [9, 10]. In transductive classification, a trained prediction model is used to predict the label of a subset of un

Data Loading...

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

Recommend Documents

All-K-Nearest Neighbors

Performance Analysis of Nearest Neighbor, K-Nearest Neighbor and Weighted K-Nearest Neighbor for the Classification of A

Reverse-K-Nearest-Neighbors aNN

Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search

Chameleon algorithm based on mutual k-nearest neighbors

A k-Nearest Neighbor Centroid-Based Outlier Detection Method

Nearest Neighbors

A Novel Unsupervised Hashing Method for Image Retrieval Based on K-Reciprocal Nearest Neighbors

Using a Genetic Algorithm for Editing k-Nearest Neighbor Classifiers

K-Nearest Neighbor Queries Over Encrypted Data

Local Centroid Distance Constrained Representation-Based K-Nearest Neighbor Classifier

Efficient and secure k -nearest neighbor query on outsourced data