Comparing different feature selection algorithms for cardiovascular disease prediction

  • PDF / 1,046,120 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 93 Downloads / 365 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Comparing different feature selection algorithms for cardiovascular disease prediction Najmul Hasan1   · Yukun Bao2 Received: 26 September 2020 / Accepted: 22 October 2020 © IUPESM and Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Determining the key features for the best model fitting in machine learning is not an easy task. The main objective of this study is to accurately predict cardiovascular disease by comparison among different feature selection algorithms. This study has employed a two-stage feature sub-set retrieving technique to achieve this goal: we first considered three well-established feature selection (filter, wrapper, embedded), and then, a feature sub-set was extracted using a Boolean process-based common “True” condition from these three algorithms. To justify the comparative accuracy and define the best predictive analytics, the well-known random forest, support vector classifier, k-nearest neighbors, Naive Bayes, and XGBoost model have been considered. The artificial neural network (ANN) has been considered as the benchmark for further comparison with all features. The experimental outcomes exhibit that the XGBoost Classifier integrated with the wrapper methods offers precise prediction results for cardiovascular disease. The proposed approach can also be applied in other domains such as sports analytics, bio-informatics, and financial analysis in contrast with healthcare informatics. This empirical study’s novelty is that the common “True” condition–based feature selection and comparison technique is entirely a new phenomenon in medical informatics. Keywords  Feature selection · Medical informatics · Cardiovascular · Machine learning

1 Introduction The early prediction and diagnosis of chronic diseases play a crucial role in healthcare informatics. Preventive action and effective treatment intervention have always been helpful when chronic disease, including diabetics, bronchiectasis, stokes, cardiovascular, cancer, asthma, hyperlipidemia, hypertension, and Parkinson’s disease, has been diagnosed at the preliminary stage. The burden of chronic diseases is growing worldwide. Riley, Guthold [1] estimated that chronic conditions constitute about 60% of the global’s * Najmul Hasan [email protected] Yukun Bao [email protected] 1



Center for Modern Information Management, School of Management, Huazhong University of Science and Technology, Wuhan 430074, P.R. China



Center for Modern Information Management, School of Management, Huazhong University of Science and Technology, Wuhan 430074, P.R. China

2

56.5 million deaths overall and about 46% of the global risk of disease. Furthermore, nearly three-quarters of all deaths worldwide are predicted to occur due to chronic diseases by 2020, with 71% of deaths due to ischemic heart disease (IHD), 75% of stroke deaths, and 70% of fatalities from diabetes in developing countries [2]. These conditions demand an imperative need to detect and prevent chronic disease at an early stage. The cardiovascular disea