Asymptotic efficiency of kernel support vector machines (SVM)
- PDF / 196,436 Bytes
- 14 Pages / 595.276 x 793.701 pts Page_size
- 9 Downloads / 239 Views
ASYMPTOTIC EFFICIENCY OF KERNEL SUPPORT VECTOR MACHINES (SVM) V. I. Norkina and M. A. Keyzerb
UDC 519:234:24:85
The paper analyzes the asymptotic properties of Vapnik’s SVM-estimates of a regression function as the size of the training sample tends to infinity. The estimation problem is considered as infinite-dimensional minimization of a regularized empirical risk functional in a reproducing kernel Hilbert space. The rate of convergence of the risk functional on SVM-estimates to its minimum value is established. The sufficient conditions for the uniform convergence of SVM-estimates to a true regression function with unit probability are given. Keywords: machine learning, estimation of dependences, recognition, kernel estimate, support vector machine (SVM), ill-posed problems, regularization, consistency, rate of convergence. INTRODUCTION The support vector method/machine (SVM) plays an important role in the modern pattern recognition theory since it successfully competes to the best-developed multilevel neural network recognition systems [1, 2]. Its nonlinear version is called the kernel SVM. It is included in the statistical learning theory founded in [3–5]. The essence of the theory is as follows [1–11]. Let there be a problem of learning, i.e., estimating a dependence (model) based on “input-output” observations. The statistical learning theory assumes that “input” is generated randomly and “output” is observed with errors. Thus, observations are assumed to be generated independently from the joint distribution of probabilities on the “input-output” space. Recognition and classification problems are important special cases of the estimation of a dependence with discrete “output”. In the statistical learning theory, estimation of dependences, pattern recognition, and classification are reduced to two-criteria optimization problems on some set of feasible dependences, one criterion (empirical risk functional) being responsible for the adequacy of the model to the data being observed, and the other (complexity) for its generalizing ability, i.e., for the adequacy of the model to the new data. The norm as an element of the model space is used as a complexity measure of the model. The main tasks of this approach are to set up space and subset of feasible models, to efficiently enumerate models, and to find a compromise between the degrees of adequacy of the model to data and the model complexity. In the statistical learning theory, the model space is linear and consists of linear or nonlinear (kernel) functions of the vector of “inputs,” each observed vector of “inputs” being associated with a model from this space. The support vector machine assumes that the unknown model is searched as a linear combination of only the models of the space that correspond to the observed “inputs.” Computationally, it can be reduced to finite-dimensional quadratic optimization problems with the number of variables equal to the number of observations. The mathematical basis of the statistical learning theory and support vector machi
Data Loading...