Early History of Support Vector Machines

Many of the ideas now being developed in the framework of Support Vector Machines were first proposed by V. N. Vapnik and A. Ya. Chervonenkis (Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia) in the framework of the “Gener

  • PDF / 138,134 Bytes
  • 8 Pages / 439.36 x 666.15 pts Page_size
  • 15 Downloads / 223 Views

DOWNLOAD

REPORT


Early History of Support Vector Machines Alexey Ya. Chervonenkis

Abstract Many of the ideas now being developed in the framework of Support Vector Machines were first proposed by V. N. Vapnik and A. Ya. Chervonenkis (Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia) in the framework of the “Generalised Portrait Method” for computer learning and pattern recognition. The development of these ideas started in 1962 and they were first published in 1964.

3.1 The “Generalised Portrait” Method for Pattern Recognition (1962) 3.1.1 Initial Heuristic Ideas Patterns (pictures) were considered as unit vectors x (jxj D 1) in some Euclidian space. That means they are situated on a unit sphere. A class of patterns is then a subset S of such a sphere. Of course, this is a heuristic: in reality vectors (patterns) may not be unit vectors. The “generalised portrait” of a class S was defined as the unit vector ' that attains max min.'; x/ D c: '

x2S

It means that the vector ' should be closest to the vectors in the set S that are most distant from '. And we hope that c > 0; A. Ya. Chervonenkis () Institute of Control Sciences, Laboratory 38, Profsoyusnaya ulitsa 65, Moscow 117997, Russia e-mail: [email protected] B. Schölkopf et al. (eds.), Empirical Inference, DOI 10.1007/978-3-642-41136-6__3, © Springer-Verlag Berlin Heidelberg 2013

13

14

A. Ya. Chervonenkis

i.e., that the set S is rather compact and does not cover too large part of the sphere. See [3]. In this way we try to cut off the segment of the unit sphere which contains completely the set S and has the minimum volume. This is done by the hyperplane .'; x/ D c; and the segment is determined by the inequality ('; x/  c. For all vectors in the set S it is true that the scalar product .'; x/  c; and there are some vectors of S for which .'; x/ D c: They were called support vectors, or marginal vectors. All marginal vectors of the set S are just on the edge of the segment; all others are within it.

3.1.2 Reducing to a Convex (Quadratic) Programming Problem Instead of maximising the scalar product of marginal vectors with the vector ' of a fixed norm, we may minimise some collinear vector under fixed restrictions. Thus our problem is equivalent to the following: find a vector with minimum norm (or its square . ; /) under the restrictions . ; x/  1; for all x 2 S. Then by the normalisation of this vector to norm 1, 'D

=j j;

we get the required vector ' (of course, it is true only if c > 0). So we have reduced the problem to a quadratic programming problem.

3.1.3 Generalisation of the Idea When we tried to apply this idea to practical pattern recognition learning (using a training sequence as the set S, i.e., using only patterns of the class we want to

3 Early History of Support Vector Machines

15

recognise), we found out that it is impossible in most cases to achieve good results without presenting examples of the opposite class or classes. In reality, we want the scalar products .'; x/ for all vectors x of the opposite class or cl