New smoothing SVM algorithm with tight error bound and efficient reduced techniques
- PDF / 725,075 Bytes
- 19 Pages / 439.37 x 666.142 pts Page_size
- 71 Downloads / 199 Views
New smoothing SVM algorithm with tight error bound and efficient reduced techniques Shuisheng Zhou · Jiangtao Cui · Feng Ye · Hongwei Liu · Qiang Zhu
Received: 8 March 2012 / Published online: 6 June 2013 © Springer Science+Business Media New York 2013
Abstract The quadratically convergent algorithms for training SVM with smoothing methods are discussed in this paper. By smoothing the objective function of an SVM formulation, Lee and Mangasarian [Comput. Optim. Appl. 20(1):5-22, 2001] presented one such algorithm called SSVM and proved that the error bound between the new smooth problem and the original one was O( p1 ) for large positive smoothing parameter p. We derive a new method by smoothing the optimality conditions of the SVM formulation, and we prove that the error bound is O( p12 ), which is better than Lee and Mangasarian’s result. Based on SMW identity and updating Hessian iteratively, some boosting skills are proposed to solve Newton equation with lower computational complexity for reduced smooth SVM algorithms. Many experimental results show that the proposed smoothing method has the same accuracy as SSVM, whose error bound is also tightened to O( p12 ) in this paper, and the proposed boosting skills are efficient for solving large-scale problems by RSVM. Keywords SSVM · Smoothing algorithm · Error bound · Reduced methods
1 Introduction Based on the Vapnik and Chervonenkis’ structural risk minimization principle [28, 29], Support Vector Machine (SVM) has emerged as a state-of-the-art tool for classification and regression. SVM solves a convex quadratic programming problem, and its solution is not only explicitly defined but also sparse. SVM has been widely used in S. Zhou () · F. Ye · H. Liu · Q. Zhu School of Science, Xidian University, Xi’an 710071, China e-mail: [email protected] J. Cui School of Computer Science and Technology, Xidian University, Xi’an 710071, China
600
S. Zhou et al.
many application areas [28, 29], such as character identification, disease diagnoses, time series prediction, etc. n Given a training dataset {(xi , yi )}m i=1 ⊂ R × {−1, 1}, SVM is to solve the following formulation (see [3, 13] etc.): m σ 1 C max 0, 1 − yi w, xi H + b , w2 + w∈H,b∈R 2 σ
min
(1)
i=1
or the formulation with a regularization term 12 b2 of bias b added to the objective function [16, 17, 22, 23]: m σ C 1 w2 + b2 + , max 0, 1 − yi w, xi H + b w∈H,b∈R 2 σ
min
(2)
i=1
with σ = 1 or 2, tradeoff parameter C > 0 and xi := φ(xi ) ∈ H for nonlinear classification problems, where φ(·) maps xi to a high dimensional, even infinite dimensional, feature space. Owing to the reproduced kernel theory, H is a reproducing kernel Hilbert space [13, 26] associated with a kernel function k : Rn × Rn → R satisfying k(xi , xj ) = φ(xi ), φ(xj )H . The learning result w can be represented by a linear combination of the kernel functions: w=
m
βi k(xi , ·),
(3)
i=1
and it is not needed to use the high dimensional map φ(·) explicitly. By plugging (3) into (1) or (2), the
Data Loading...