In-depth analysis of SVM kernel learning and its components

  • PDF / 2,317,836 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 33 Downloads / 197 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789().,-volV)

ORIGINAL ARTICLE

In-depth analysis of SVM kernel learning and its components Ibai Roman1



Roberto Santana1



Alexander Mendiburu1



Jose A. Lozano1,2

Received: 29 April 2020 / Accepted: 5 October 2020  Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The performance of support vector machines in nonlinearly separable classification problems strongly relies on the kernel function. Toward an automatic machine learning approach for this technique, many research outputs have been produced dealing with the challenge of automatic learning of good-performing kernels for support vector machines. However, these works have been carried out without a thorough analysis of the set of components that influence the behavior of support vector machines and their interaction with the kernel. These components are related in an intricate way and it is difficult to provide a comprehensible analysis of their joint effect. In this paper, we try to fill this gap introducing the necessary steps in order to understand these interactions and provide clues for the research community to know where to place the emphasis. First of all, we identify all the factors that affect the final performance of support vector machines in relation to the elicitation of kernels. Next, we analyze the factors independently or in pairs and study the influence each component has on the final classification performance, providing recommendations and insights into the kernel setting for support vector machines. Keywords SVM  Kernel learning  Genetic programming  Automatic machine learning

1 Introduction Support vector machines (SVMs) [52] have been, for a long time, the reference paradigm in supervised classification and regression. Although the field is nowadays overwhelmed by the application of deep learning approaches, SVMs are still one of the best alternatives when the requirements of deep neural networks are not met. When applied to binary classification problems, SVMs separate

& Ibai Roman [email protected] Roberto Santana [email protected] Alexander Mendiburu [email protected] Jose A. Lozano [email protected] 1

Intelligent Systems Group, University of the Basque Country, UPV/EHU, Paseo Manuel de Lardizabal 1, 20018 Donostia, Spain

2

Basque Center for Applied Mathematics, BCAM, Alameda de Mazarredo 14, 48009 Bilbao, Spain

samples from the two different classes by means of a hyperplane that maximizes the gap to the nearest samples in order to ensure a proper generalization. SVMs can even handle nonlinearly separable problems by means of a kernel function [4], and when this kernel meets Mercer’s condition [33], the optimal hyperplane can be found. Although SVMs are a suitable tool to solve classification problems, they involve several components that should be adjusted in order to obtain a good performance. Among these, the choice of the kernel heavily influences the performance of SVMs, and there is no rule of thumb to select it. While some standard kernels prop