Classification and prediction of diabetes disease using machine learning paradigm

  • PDF / 2,225,355 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 50 Downloads / 221 Views

DOWNLOAD

REPORT


Health Information Science and Systems

RESEARCH

Classification and prediction of diabetes disease using machine learning paradigm Md. Maniruzzaman1,2*, Md. Jahanur Rahman2, Benojir Ahammed1 and Md. Menhazul Abedin1

Abstract  Background and objectives:  Diabetes is a chronic disease characterized by high blood sugar. It may cause many complicated disease like stroke, kidney failure, heart attack, etc. About 422 million people were affected by diabetes disease in worldwide in 2014. The figure will be reached 642 million in 2040. The main objective of this study is to develop a machine learning (ML)-based system for predicting diabetic patients. Materials and methods:  Logistic regression (LR) is used to identify the risk factors for diabetes disease based on p value and odds ratio (OR). We have adopted four classifiers like naïve Bayes (NB), decision tree (DT), Adaboost (AB), and random forest (RF) to predict the diabetic patients. Three types of partition protocols (K2, K5, and K10) have also adopted and repeated these protocols into 20 trails. Performances of these classifiers are evaluated using accuracy (ACC) and area under the curve (AUC). Results:  We have used diabetes dataset, conducted in 2009–2012, derived from the National Health and Nutrition Examination Survey. The dataset consists of 6561 respondents with 657 diabetic and 5904 controls. LR model demonstrates that 7 factors out of 14 as age, education, BMI, systolic BP, diastolic BP, direct cholesterol, and total cholesterol are the risk factors for diabetes. The overall ACC of ML-based system is 90.62%. The combination of LR-based feature selection and RF-based classifier gives 94.25% ACC and 0.95 AUC for K10 protocol. Conclusion:  The combination of LR and RF-based classifier performs better. This combination will be very helpful for predicting diabetic patients. Keywords:  Diabetes, Classification, Machine learning, Naïve Bayes, Decision tree, Random forest, Adaboost Introduction Diabetes mellitus (DM) is commonly known as diabetes. It is a group of metabolic disorders which are characterized by the high blood sugar [1–3]. Diabetes can lead to many serious long-term complicated disease like cardiovascular disease, stroke, kidney failure, heart attack, peripheral arterial disease, blood vessels, and nerves [4, 5]. About 122 million people were affected by diabetes in worldwide in 1980 and this figure was reached about 422 million in 2014 [6]. The figure will be reached about 642 million in 2040 [7]. Moreover, there were directly about 1.6 million deaths due to diabetes [8]. Therefore; it is an alarming figure to us. The number of diabetic patients is

*Correspondence: [email protected] 1 Statistics Discipline, Khulna University, Khulna 9208, Bangladesh Full list of author information is available at the end of the article © Springer Nature Switzerland AG 2020.

increased day by day as a result deaths are also increased day by day. Diabetes can be divided into three types as (i) type I diabetes (T1D), (ii) type II diabetes (T2D), and (iii) gestati