Comparative study on classification performance between support vector machine and logistic regression
- PDF / 379,057 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 95 Downloads / 201 Views
ORIGINAL ARTICLE
Comparative study on classification performance between support vector machine and logistic regression Abdallah Bashir Musa
Received: 4 September 2011 / Accepted: 2 January 2012 / Published online: 24 January 2012 Springer-Verlag 2012
Abstract Support vector machine (SVM) is a comparatively new machine learning algorithm for classification, while logistic regression (LR) is an old standard statistical classification method. Although there have been many comprehensive studies comparing SVM and LR, since they were made, there have been many new improvements applied to them such as bagging and ensemble. Recently, bagging and ensemble learning have become hot topics, widely used to improve the generalization performance of single learning algorithm. Therefore, comparing classification performance between SVM and LR using bagging and ensemble is an interesting issue. The average of estimated probabilities’ strategy was used for combining classifiers in this paper. Different evaluation metrics assess different characteristics of machine learning algorithm. It is possible for a learning method to perform well on one metric, but be suboptimal on other metrics. Therefore this study includes a variety of criteria to evaluate the classification performance of the learning methods: accuracy, sensitivity, specificity, precision, F-score and the area under the receiver operating characteristic curve. This has not been included in previous studies of SVM, owing to the fact that it did not support estimated probabilities at that time. Other metrics used in medical diagnosis, such as, Youden’s index (c), positive and negative likelihoods (q?, q-) and diagnostic odds ratio were evaluated to convey and compare the qualities of the two algorithms. This study is distinct by its inclusion of a comprehensive statistical analysis for the results of the SVM and LR algorithms on various data sets.
A. B. Musa (&) Faculty of Mathematical Sciences and Computer, University of Gezira, Wad Madani 20, Sudan e-mail: [email protected]
Keywords Support vector machine (SVM) Logistic regression (LR) Machine learning algorithm Bagging Ensemble Statistical analysis
1 Introduction Logistic regression (LR) [1, 2] is a multivariable method devised for dichotomous outcomes. It is a standard statistical classification method which is particularly appropriate for models involving disease state (healthy/diseased), decision making (yes/no), or mortality (dead, living). It is widely used in binary classification problems in applied sciences such as medicine, biology and epidemiology. It has been widely applied due to its simplicity and great interpretability. Logistic regression needs special requirements regarding the data under consideration, such as, little or no collinearly among the independent variables and linearity of the independent variables with the logit. In contrast, SVM [3, 4, 5] recently, has become a very popular machine learning tool for classification. It is easy and uncomplicated as compared to LR. Nowadays
Data Loading...