Using a Genetic Algorithm for Editing k-Nearest Neighbor Classifiers
The edited k-nearest neighbor consists of the application of the k-nearest neighbor classifier with an edited training set, in order to reduce the classification error rate. This edited training set is a subset of the complete training set in which some o
- PDF / 339,321 Bytes
- 10 Pages / 430 x 660 pts Page_size
- 48 Downloads / 194 Views
Teor´ıa de la Se˜ nal y Comunicaciones, Universidad de Alcal´ a, Madrid (Spain) Computer Sciences Department, University of Birmingham, Birmingham (UK) Nature Inspired Computation and Applications Laboratory, University of Science and Technology of China, Hefei, Anhui 230027 (P.R. China)
2 3
Abstract. The edited k-nearest neighbor consists of the application of the k-nearest neighbor classifier with an edited training set, in order to reduce the classification error rate. This edited training set is a subset of the complete training set in which some of the training patterns are excluded. In recent works, genetic algorithms have been successfully applied to generate edited sets. In this paper we propose three improvements of the edited k-nearest neighbor design using genetic algorithms: the use of a mean square error based objective function, the implementation of a clustered crossover, and a fast smart mutation scheme. Results achieved using the breast cancer database and the diabetes database from the UCI machine learning benchmark repository demonstrate the improvement achieved by the joint use of these three proposals.
1
Introduction
Editing a k-nearest neighbor (kNN) consists of the application of the kNN classifier with an edited training set in order to improve the performance of the classifier in terms of error rate [1]. This edited training set is a subset of the complete training set in which some of the training patterns are excluded. So, depending on the characteristics of the database [2], and due to the exclusion of these patterns, the kNN may render better results using the edited set, in terms of both error rate and computational cost. Genetic algorithms (GA) have been successfully applied to select the training patterns included in the edited training set. In [3] a study of editing kNN classifiers using GAs with different objective functions is presented. Several databases like the Iris database or the Heart database are used in the experiments. The paper concludes that, from the analyzed objective functions, the best results are obtained when the counting estimator with penalizing term is selected as objective function. Other interesting article is [4], in which a GA with a novel crossover method is applied. When two parents are crossed, a high number of
This work has been partially funded by the Comunidad de Madrid/Universidad de Alcal´ a (CCG06-UAH/TIC-0378) and the Spanish Ministry of Education and Science (TEC2006-13883-C04-04/TCM).
H. Yin et al. (Eds.): IDEAL 2007, LNCS 4881, pp. 1141–1150, 2007. c Springer-Verlag Berlin Heidelberg 2007
1142
R. Gil-Pita and X. Yao
possible offsprings are evaluated, and the best two individuals are selected. The work presented in [5] is other interesting paper that studies the kNN edited with other heuristic techniques, in which the authors study the use of tabu search to solve the problem of editing a 1NN classifier (nearest neighbor rule). They use the counting estimator objective function with a penalizing term, and they evaluate the results with the i
Data Loading...