Research and Application of Fast Multi-label SVM Classification Algorithm Using Approximate Extreme Points
In Large-Scale of Multi-label classification framework, applications of Non-linear kernel support vector machines (SVMs) classification algorithm are restricted by the problem of excessive training time. Hence, we propose Approximate Extreme Points Multi-
- PDF / 658,489 Bytes
- 14 Pages / 439.37 x 666.142 pts Page_size
- 37 Downloads / 211 Views
Department of Computer Science and Technology, Ocean University of China, Qingdao, China [email protected] 2 Department of Computer Foundation, Ocean University of China, Qingdao, China [email protected]
Abstract. In Large-Scale of Multi-label classification framework, applications of Non-linear kernel support vector machines (SVMs) classification algorithm are restricted by the problem of excessive training time. Hence, we propose Approximate Extreme Points Multi-label Support Vector Machine (AEMLSVM) classification algorithm to solve this problem. The first step of AEMLSVM classification algorithm is using approximate extreme points method to extract the training subsets, called the representative sets, from training dataset. Then SVM is trained from the representative sets. In addition, the AEMLSVM classification algorithm also can adopt Cost-Sensitive method to deal with the imbalanced data issue. Experiment results from three Large-Scale public datasets show that AEMLSVM classification algorithm can substantially shorten training time greatly and obtain a similar result compared with the traditional Multi-label SVM classification algorithm. It also exceeds existing fast Multi-label SVM classification algorithm in both training time and effectiveness. Besides, AEMLSVM classification algorithm has advantages in the classification time.
Keywords: Support vector machine Extreme points · Imbalanced data
1
·
Multi-label classification
·
Introduction
Multi-label classification is a typical supervised learning issue, in which each individual example is represented by an instance. However, every instance can be possibly linked to several labels, thus the labels are no longer mutually exclusive [1]. Researchers have proposed many Multi-label classification methods, for example, methods based on problem transformation strategy, methods based on SVM, methods based on neural network, methods based on decision tree c Springer International Publishing Switzerland 2016 Y. Wang et al. (Eds.): BigCom 2016, LNCS 9784, pp. 39–52, 2016. DOI: 10.1007/978-3-319-42553-5 4
40
Z. Sun et al.
and methods based on K-nearest neighbor (KNN) [7]. These methods have been successfully applied in the field of text categorization [2], automatic image and video annotation [3], bioinformatics prediction [4], music emotion categorization [5], etc. However, many current Multi-label classification methods cannot work efficiently in Large-Scale datasets. The main restriction is the excessive training time, which is especially obvious in SVM. Traditional SVM [6] is a widely used machine learning method which can only solve Single-Instance Single-Label classification problem. But, improved SVM algorithm like Rank-SVM [8] algorithm can work on Multi-label classification. Because Non-linear dataset is very universal in Multi-label datasets, Non-linear kernel is required to attain a better effect in classification. This further restricts the use of the SVM algorithm in Large-Scale datasets. Besides, an unavoidable problem in Multi-label classification algorithm is
Data Loading...