A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k

PDF / 901,899 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
95 Downloads / 241 Views

RESEARCH ARTICLE-ELECTRICAL ENGINEERING

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation Onur Inan1

· Mustafa Serter Uzer2

Received: 14 February 2020 / Accepted: 17 September 2020 © King Fahd University of Petroleum & Minerals 2020

Abstract Non-system errors that occur during data entry or data collection create noisy data that reduce the success of classification systems. To eliminate this data, a classification system with a new data reduction method consisting of a modified k-means algorithm using relief algorithm coefficients named MKMA-RAC was developed. The main theme of this article is the elimination of noisy data and its consistent application to the classification system using the k-fold cross-validation method. By means of the developed system, the training data became free from noisy data by integrating the support vector machine, linear discriminant analysis (LDA) and decision tree classifiers with MKMA-RAC-based data reduction for every fold. The data reduction process was not applied for the test data. Datasets used in the proposed method were the Hepatitis, Liver Disorders, SPECT images and Statlog (Heart) dataset taken from the UCI database. Classification performance values obtained both from the proposed method and without the proposed method with tenfold CV were given for these datasets. For Hepatitis, Liver Disorders, SPECT images and Statlog (Heart) datasets, and classification successes of the proposed system with SVM classifier were 96.88%, 74.56%, 87.24%, and 90.00%, classification successes of the proposed system with LDA classifier were 94.91%, 69.05%, 82.38%, and 88.52%, classification successes of the proposed system with decision tree classifier were 96.25%, 77.73%, 88.77% and 89.63%, respectively. The test results have shown that the proposed system generally achieved higher classification performance than other literature results. Therefore, the performance is very encouraging for pattern recognition applications. Keywords Clustering-based data elimination · Relief · Medical dataset classification

Abbreviations MKMA-RAC Modified k-means algorithm using relief algorithm coefficients k-fold CV k-Fold cross-validation SVM Support vector machine FS Feature selection SPECT Single-proton emission computed tomography

B

Onur Inan [email protected] Mustafa Serter Uzer [email protected]

1

Computer Engineering, Faculty of Engineering and Architecture, Necmettin Erbakan University, Konya, Turkey

2

Electronics and Automation, Selcuk University Ilgın Vocational School, Konya, Turkey

PPV NPV LDA

Positive predictive value Negative predictive value Linear discriminant analysis

1 Introduction The large amounts of datasets obtained from the medical treatment and diagnosis processes have been one of the most important fields of study on pattern recognition and data mining techniques. These medical datasets are also used in order to test the newly developed artificial intelligence techniques.

Data Loading...

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k

Recommend Documents

A Combined Learning-Based Bagging Method for Classification Improvement

Improvement of experimental data via consistency conditions

A classification device capable of being integrated to flotation columns and its classification performance

A practical method for well log data classification

Performance Analysis of Nearest Neighbor, K-Nearest Neighbor and Weighted K-Nearest Neighbor for the Classification of A

A Weighted Combination Method of Multiple K-Nearest Neighbor Classifiers for EEG-Based Cognitive Task Classification

A Blockchain-Based Crowdsourcing System with QoS Guarantee via a Proof-of-Strategy Consensus Protocol

Hybrid Loss for Improving Classification Performance with Unbalanced Data

Demonstration of indigenous malaria elimination through Track-Test-Treat-Track (T4) strategy in a Malaria Elimination De

Optimal subspace classification method for complex data

Improvement of a Web Engineering Method Through Usability Patterns

A novel method for spectral-spatial classification of hyperspectral images with a high spatial resolution