A Discrete Approach for Supervised Pattern Recognition
We present an approach for supervised pattern recognition based on combinatorial analysis of optimum paths from key samples (prototypes), which creates a discrete optimal partition of the feature space such that any unknown sample can be classified accord
- PDF / 264,484 Bytes
- 12 Pages / 430 x 660 pts Page_size
- 40 Downloads / 217 Views
Institute of Computing, State University of Campinas, Av. Albert Einstein 1216, Campinas, S˜ ao Paulo, Brazil {papa.joaopaulo,alexandre.falcao,celso.suzuki}@gmail.com 2 Department of Computing, Federal University of S˜ ao Carlos, Rod. Washington Luis, Km 235, S˜ ao Carlos, S˜ ao Paulo, Brazil [email protected]
Abstract. We present an approach for supervised pattern recognition based on combinatorial analysis of optimum paths from key samples (prototypes), which creates a discrete optimal partition of the feature space such that any unknown sample can be classified according to this partition. A training set is interpreted as a complete graph with at least one prototype in each class. They compete among themselves and each prototype defines an optimum-path tree, whose nodes are the samples more strongly connected to it than to any other. The result is an optimumpath forest in the training set. A test sample is assigned to the class of the prototype which offers it the optimum path in the forest. The classifier is designed to achieve zero classification errors in the training set, without over-fitting, and to learn from its errors. A comparison with several datasets shows the advantages of the method in accuracy and efficiency with respect to support vector machines. Keywords: supervised learning, optimum-path forest, image foresting transform, pattern recognition, graph-search algorithms.
1
Introduction
Graph-based approaches for pattern recognition are usually unsupervised (data clustering). They mostly follow a same principle of creating a neighborhood graph for the data samples and then removing inconsistent arcs based on some criterion [14]. Other approaches interpret data clustering as a graph-cut problem [26]. Problems in these approaches are the overlap between different clusters and no guarantee of success, because it is hard to assign samples to their correct class without any prior knowledge. Such problems are better treated by supervised approaches. Artificial neural networks (ANN) [13] and support vector machines (SVM) [4] are supervised approaches actively pursued in the last years. An ANN multilayer perceptron (ANN-MLP) trained by backpropagation, for example, is an V.E. Brimkov, R.P. Barneva, H.A. Hauptman (Eds.): IWCIA 2008, LNCS 4958, pp. 136–147, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Discrete Approach for Supervised Pattern Recognition
137
unstable classifier. Its accuracy may be improved at the computational cost of using multiple classifiers and algorithms (e.g., bagging and boosting) for training classifier collections [16]. However, it seems that there is an unknown limit in the number of classifiers to avoid accuracy degradation [23]. ANN-MLP assumes that the classes can be separated by hyperplanes in the feature space. Such assumption is unfortunately not valid in practice. SVM was proposed to overcome the problem by assuming it is possible to separate the classes in a higher dimensional space by optimum hyperplanes. Although SVM usually provides reasonable accuracies, its computationa
Data Loading...