Combining Boundary Detector and SND-SVM for Fast Learning

  • PDF / 1,721,510 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 45 Downloads / 174 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Combining Boundary Detector and SND‑SVM for Fast Learning Yugen Yi1 · Yanjiao Shi2 · Wenle Wang1 · Gang Lei1 · Jiangyan Dai3 · Hao Zheng4 Received: 1 October 2019 / Accepted: 1 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract As a state-of-the-art multi-class supervised novelty detection method, supervised novelty detection-support vector machine (SND-SVM) is extended from one-class support vector machine (OC-SVM). It still requires to slove a more time-consuming quadratic programming (QP) whose scale is the number of training samples multiplied by the number of normal classes. In order to speed up SND-SVM learning, we propose a down sampling framework for SND-SVM. First, the learning result of SND-SVM is only decided by minor samples that have non-zero Lagrange multipliers. We point out that the potential samples with non-zero Lagrange multipliers are located in the boundary regions of each class. Second, the samples located in boundary regions can be found by a boundary detector. Therefore, any boundary detector can be incorporated into the proposed down sampling framework for SND-SVM. In this paper, we use a classical boundary detector, local outlier factor (LOF), to illustrate the effective of our down sampling framework for SND-SVM. The experiments, conducted on several benchmark datasets and synthetic datasets, show that it becomes much faster to train SND-SVM after down sampling. Keywords  SND-SVM · Critical samples · Boundary detection · Subset selection

1 Introduction In some applications, minor abnormal samples that are not consistent with the major distribution(s) are more meaningful than others. For instance, we are more interested in the abnormal visiting in network instruction detection [1]. These abnormal samples are called novelties, outliers, or anomalies. The novelty detection is widely used in the machine learning and pattern recognition problems, such as network instruction, medical diagnosis [2–5]. The supervised novelty detection learns a model from massive labelled samples [6]. For a test sample, the model could return whether it is novelty or not. When all labelled samples are independent and identically distributed, the supervised novelty * Hao Zheng [email protected] 1



School of Software, Jiangxi Normal University, Nanchang, China

2



School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai, China

3

School of Computer Engineering, Weifang University, Weifang, China

4

School of Engineering Information, Nanjing Xiaozhuang University, Nanjing, China



detection can be seen as a one-class classification problem [7–9]. However, one-class classifier cannot be directly used when the normal samples does not obey the independent identically distributed assumption (from a mixture of distributions). When the normal samples are from a mixture of distributions, one way is to treat all normal samples from a superclass [10]; the other way is to train several one-class classifiers [11]. In the forme