Bayesian Citation-KNN with distance weighting
- PDF / 267,068 Bytes
- 7 Pages / 595.276 x 790.866 pts Page_size
- 29 Downloads / 243 Views
ORIGINAL ARTICLE
Bayesian Citation-KNN with distance weighting Liangxiao Jiang • Zhihua Cai • Dianhong Wang Harry Zhang
•
Received: 2 October 2012 / Accepted: 1 February 2013 Ó Springer-Verlag Berlin Heidelberg 2013
Abstract Multi-instance (MI) learning is receiving growing attention in the machine learning research field, in which learning examples are represented by a bag of instances instead of a single instance. K-nearest-neighbor (KNN) is a simple and effective classification model in the traditional supervised learning. As its two variants, Bayesian-KNN (BKNN) and Citation-KNN (CKNN) are proposed and are widely used for solving multi-instance classification problems. However, CKNN still applies the simplest majority vote approach among the references and citers to classify unseen bags. In this paper, we propose an improved algorithm called Bayesian Citation-KNN (BCKNN). For each unseen bag, BCKNN firstly finds its k references and q citers respectively, and then a Bayesian approach is applied to its k references and a distance weighted majority vote approach is applied to its q citers. The experimental results on several benchmark datasets show that our BCKNN is generally better than previous BKNN and CKNN. Besides, BCKNN almost maintains the same order of computational overhead as CKNN.
L. Jiang (&) Z. Cai Department of Computer Science, China University of Geosciences, Wuhan 430074, China e-mail: [email protected] D. Wang Department of Electronic Engineering, China University of Geosciences, Wuhan 430074, China H. Zhang Faculty of Computer Science, University of New Brunswick, Fredericton E3B5A3, Canada
Keywords Multi-instance learning KNN Bayesian-KNN Citation-KNN Bayesian Citation-KNN Distance weighting
1 Introduction Multi-instance learning (MI learning) has received much attention in the machine learning research field. MI learning is a variation of the standard supervised learning. In standard supervised learning, each example is an instance represented by an attribute vector, augmented with a class label. The learning task is to build a classifier that predicts the class label of an unseen instance, given its attribute vector. In MI learning, however, each example consists of a bag of instances. Each bag has a class label, but the instances themselves are not labeled. Therefore, the learning task is to build a classifier that predicts the class label of an unseen bag [1]. Recently, some researchers combine multi-instance learning and multi-label learning [2] to propose another machine learning framework: MultiInstance Multi-label learning (MIML learning) [3, 4]. In this paper, we temporarily focus our attention on the multiinstance learning. Two different approaches have been adopted to classify an unseen bag of instances in the context of multi-instance problem. The first approach classifies a bag as negative if all the instances in it are negative and positive if at least one instance in it is positive [5]. In contrast, another approach classifies a bag as the maximum label among
Data Loading...