Optimising Multiple Kernels for SVM by Genetic Programming

Kernel-based methods have shown significant performances in solving supervised classification problems. However, there is no rigorous methodology capable to learn or to evolve the kernel function together with its parameters. In fact, most of the classic

  • PDF / 458,334 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 97 Downloads / 184 Views

DOWNLOAD

REPORT


Babes¸-Bolyai University, Cluj-Napoca, Romania 2 LITIS, EA 4108, INSA, Rouen, France [email protected], {arogozan,pecuchet}@insa-rouen.fr

Abstract. Kernel-based methods have shown significant performances in solving supervised classification problems. However, there is no rigorous methodology capable to learn or to evolve the kernel function together with its parameters. In fact, most of the classic kernel-based classifiers use only a single kernel, whereas the real-world applications have emphasized the need to consider a combination of kernels - also known as a multiple kernel (M K) - in order to boost the classification accuracy by adapting better to the characteristics of the data. Our aim is to propose an approach capable to automatically design a complex multiple kernel (CMK) and to optimise its parameters by evolutionary means. In order to achieve this purpose we propose a hybrid model that combines a Genetic Programming (GP) algorithm and a kernel-based Support Vector Machine (SVM) classifier. Each GP chromosome is a tree that encodes the mathematical expression of a MK function. Numerical experiments show that the SVM involving our evolved complex multiple kernel (eCMK) perform better than the classical simple kernels. Moreover, on the considered data sets, our eCMK outperform both a state of the art convex linear MK (cLMK) and an evolutionary linear MK (eLMK). These results emphasize the fact that the SVM algorithm requires a combination of kernels more complex than a linear one.

1 Introduction Various classification techniques have been used in order to detect correctly the labels associated to some items. Kernel-based techniques (such as Support Vector Machine (SVM) [1]) are an example of such intensively explored classifiers. These methods represent the data by means of a kernel function, which defines similarities between pairs of data [2]. One reason for the success of kernel-based methods is that the kernel function takes relationships that are implicit in the data and makes them explicit, the result being that the detection of patterns takes place more easily. The selection of an appropriate kernel K is the most important design decision in SVM since it implicitly defines the feature space F and the map φ. An SVM will work correctly even if we do not know the exact form of the features that are used in F . The performance of an SVM algorithm depends also on several parameters. One of them, denoted C, controls the trade-off between maximizing the margin and classifying without error. The other parameters regard the kernel function. For simplicity, Chapelle in [3] has proposed to denote all these parameters as hyper parameters. All that hyper J. van Hemert and C. Cotta (Eds.): EvoCOP 2008, LNCS 4972, pp. 230–241, 2008. c Springer-Verlag Berlin Heidelberg 2008 

Optimising Multiple Kernels for SVM by Genetic Programming

231

parameters have to be tuned. This is a difficult problem, since the estimate of the error on a validation set is not an explicit function of these parameters. The selection