Missing data techniques in classification for cardiovascular dysautonomias diagnosis
- PDF / 1,363,016 Bytes
- 16 Pages / 595.276 x 790.866 pts Page_size
- 8 Downloads / 197 Views
ORIGINAL ARTICLE
Missing data techniques in classification for cardiovascular dysautonomias diagnosis Ali Idri 1,2 & Ilham Kadi 1 & Ibtissam Abnane 1 & José Luis Fernandez-Aleman 3 Received: 22 November 2018 / Accepted: 8 September 2020 # International Federation for Medical and Biological Engineering 2020
Abstract Missing data (MD) is a common and inevitable problem facing data mining (DM)–based decision systems in e-health since many medical historical datasets contain a huge number of missing values. Therefore, a pre-processing stage is usually required to deal with missing values before building any DM–based decision system. The purpose of this paper is to evaluate the impact of MD techniques on classification systems in cardiovascular dysautonomias diagnosis. We analyzed and compared the accuracy rates of four classification techniques: random forest (RF), support vector machines (SVM), C4.5 decision tree, and Naive Bayes (NB), using two MD techniques: deletion or imputation with k-nearest neighbors (KNN). A total of 216 experiments were therefore carried out using three missingness mechanisms (MCAR: missing completely at random, MAR: missing at random and NMAR: not missing at random), two MD techniques (deletion and KNN imputation), nine MD percentages from 10 to 90% over a dataset collected from the autonomic nervous system (ANS) unit of the University Hospital Avicenne in Morocco. The results obtained suggest that using KNN imputation rather than deletion enhances the accuracy rates of the four classifiers. Moreover, the MD percentages have a negative impact on the performance of classification techniques regardless of the MD mechanisms and MD techniques used. In fact, the accuracy rates of the four classifiers decrease as the MD percentage increases. Keywords Missing data . KNN imputation . Missingness mechanism . Cardiology
1 Introduction Heart disease refers to a range of conditions that can affect heart function. These conditions include coronary artery disease, valvular heart disease, cardiomyopathy, heart rhythm
* Ali Idri [email protected]; [email protected] Ilham Kadi [email protected] Ibtissam Abnane [email protected] José Luis Fernandez-Aleman [email protected] 1
Software Project Management Research Team, Mohammed V University, Rabat, Morocco
2
CSEHS-MSDA, Mohammed VI Polytechnic University, Ben Guerir, Morocco
3
Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Murcia, Spain
disturbances, and heart infections [1]. Heart disease is considered as the leading cause of death globally in the last 15 years. In fact, of the 56.4 million deaths worldwide in 2015, 15 million deaths were due to a heart condition [2]. For this reason, researchers have long been concerned with developing intelligent systems so as to help physicians in the diagnosis of heart disease. However, the cardiac field generates a large amount of data which requires powerful data analysis tools to extract useful knowledge. Data mining (DM) aims to extract useful knowledge and
Data Loading...