Multi-parameter safe screening rule for hinge-optimal margin distribution machine

  • PDF / 1,333,278 Bytes
  • 12 Pages / 595.224 x 790.955 pts Page_size
  • 62 Downloads / 177 Views

DOWNLOAD

REPORT


Multi-parameter safe screening rule for hinge-optimal margin distribution machine Mengdan Ma1 · Yitian Xu1 Accepted: 13 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Optimal margin distribution machine (ODM) is an efficient algorithm for classification problems. ODM attempts to optimize the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously, so it can achieve a better generalization performance. However, it is relatively time-consuming for large-scale problems. In this paper, we propose a hinge loss-based optimal margin distribution machine (Hinge-ODM), which derives a simplified substitute formulation. It can speed up the solving process without affecting the optimal accuracy obviously. Besides, inspired by its sparse solution, we put forward a multi-parameter safe screening rule for Hinge-ODM, called MSSR-Hinge-ODM. Based on the MSSR, most non-support vectors can be identified and deleted beforehand so the scale of dual problem will be greatly reduced. Moreover, our MSSR is safe, that is, it can get the exactly same optimal solutions as the original one. Furthermore, a fast algorithm DCDM is introduced to further solve the reduced Hinge-ODM. Finally, we integrate the MSSR into grid search method to accelerate the whole training process. Experimental results on twenty data sets demonstrate the superiority of the proposed methods. Keywords Optimal margin · Hinge loss · Safe screening · Multi-parameter · DCDM

1 Introduction Support vector machine (SVM) is one of the popular learning algorithms for data classification during the past few decades [1–4]. SVM looks for an optimal separating hyperplane so that samples in different classes are separated correctly. It is based on the idea of maximizing the minimum margin, i.e., the smallest distance from the samples to the classification hyperplane. And the margin theory [2] provided a good support for it. However, it takes great efforts to apply margin theory to interpret the good generalization of AdaBoost, a representative of ensemble methods [5]. Specifically, the phenomenon that AdaBoost seems resistant to over-fitting makes the minimum margin controversial [6]. Until 2006, Reyzin et al. [7] found that pursuing only a larger minimum margin sometimes leads to a poor margin distribution. They

 Yitian Xu

1

College of Science, China Agricultural University, Beijing 100083, China

hold the view that for generalization performance, the margin distribution is more significant than a single minimum margin. Gao and Zhou’s study [8, 9] confirmed this conjecture and gave an explanation for AdaBoost based on the margin theory. Further more, they showed that the margin distribution should be characterized by both the first and second order statistics of margin, i.e., the margin mean and variance. In 2014, Zhou [10] extended the idea to SVM and proposed the Large margin Distribution Machine(LDM). LDM tries to achieve strong generalization performance by directly maximizing the margin