Improving the Accuracy of the KNN Method When Using an Even Number K of Neighbors

The kNN (k Nearest Neighbors) method is a classification method that could show low accuracy figures for even values of k. This paper details one method to improve the accuracy of the kNN method for those cases. It also shows one method that could improve

PDF / 848,909 Bytes
7 Pages / 595.276 x 790.866 pts Page_size
35 Downloads / 244 Views

DOWNLOAD

REPORT

Abstract

The kNN (k Nearest Neighbors) method is a classiﬁcation method that could show low accuracy ﬁgures for even values of k. This paper details one method to improve the accuracy of the kNN method for those cases. It also shows one method that could improve the accuracy of it for biased classiﬁcation sets and for odd values of k.

1

Introduction

There are many machine learning methods for data classiﬁcation [1], and several of them have been used for breast cancer prognosis and diagnosis [2–5]. This paper deals with the improvement of the kNN (k-Nearest Neighbor) method [6]. It is a nonparametric classiﬁcation method that has high accuracy. It has even been used to detect different stages of breast cancer [7, 8]. We have conducted research on the details of its use for breast cancer diagnosis and prognosis [9, 10]. When using the kNN method for breast cancer prognosis we have found that it has low accuracy when the number of neighbors k is small and even. When k takes an even value there are some cases where the data used for classiﬁcation splits in equal sized groups and the class is not chosen by majority, but by the order of the groups used to determine the class. To improve the accuracy in those cases we have devised a method that is very effective for small even values of k. It is detailed in the following section. We show in section III the results of applying it to the breast cancer prognosis data of UCI [11]. A. P. Pawlovsky (&) Faculty of Biomedical Engineering, Department of Clinical Engineering, Toin University of Yokohama, Kanagawa, Japan e-mail: [email protected] D. Kurematsu Department of Clinical Engineering, Toin University of Yokohama, Kanagawa, Japan

When evaluating the kNN method the target data is splitted into two sets. One is used for classiﬁcation and the other one as a test set. Usually the test set is randomly chosen in a typical implementation and this way of forming it can give place to classiﬁcation sets that contain data of only one class. This type of sets will completely bias the classiﬁcation process and will also lower the average accuracy of the method when evaluating it. We also researched on the use of an implementation that helps avoid these cases. We also shown some details of its implementation and its combination with the method to improve the accuracy for even values of k.

2

The KNN Method

The k-Nearest Neighbor (kNN) method is an unsupervised nonparametric machine learning method used for classiﬁcation tasks. To evaluate the accuracy of it and other classiﬁcation algorithms we usually divide an already classiﬁed data in two sets. One is used for the classiﬁcation task and the other one is used as a test set. Then one datum at a time is taken from the test set and compared with the data in the classiﬁcation set. In Fig. 1 we show an example where an unclassiﬁed data is brought to the classiﬁcation set and the similarity of it to the surrounding ones is measured. Similarity is usually measured using the Euclidean distance between data, but other distances can a

Data Loading...

Improving the Accuracy of the KNN Method When Using an Even Number K of Neighbors

Recommend Documents

Automorphic Forms and Even Unimodular Lattices Kneser Neighbors of N

All-K-Nearest Neighbors

Improving the accuracy of pruned network using knowledge distillation

An entropy-based initialization method of K -means clustering on the optimal number of clusters

Improving the accuracy of coliform detection in meat products using modified dry rehydratable film method

Reverse-K-Nearest-Neighbors aNN

Particle Swarm Optimization (PSO) for improving the accuracy of ChemCam LIBS sub-model quantitative method

Re-ranking person re-identification using distance aggregation of k-nearest neighbors hierarchical tree

On the Number of Simple $$K_4$$ K 4 Groups

Study of the Accuracy of Determining the Location of Radio Emission Sources with Complex Signals When Using Autocorrelat

Improving the Prediction Accuracy of ASD Using Class Imbalance Mitigation Technique

Surfaces generating the even primal cohomology of an abelian fivefold