Using Big Data-machine learning models for diabetes prediction and flight delays analytics

PDF / 1,507,962 Bytes
18 Pages / 595.276 x 790.866 pts Page_size
94 Downloads / 236 Views

pen Access

RESEARCH

Using Big Data‑machine learning models for diabetes prediction and flight delays analytics Thérence Nibareke* and Jalal Laassiri *Correspondence: [email protected] Informatics Systems and Optimization Laboratory, Ibn Tofail University, Kenitra, Morocco

Abstract Introduction: Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description: We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation: The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions: Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.). Keywords: Big Data, Hadoop, Spark, HBase, Machine learning, Data analytics, Accuracy, K-Nearest Neighbor, K means

Introduction In recent decades, increasingly large amounts of data are generated from a variety of sources. The size of generated data per day on the Internet has already exceeded two Exabyte. Within 1 min, 72 h of videos are uploaded to YouTube, around 30.000 new © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the

Data Loading...

Using Big Data-machine learning models for diabetes prediction and flight delays analytics

Recommend Documents

Big Data Analytics and Machine Learning Technologies for HPC Applications

Deep Learning: Convergence to Big Data Analytics

Machine Learning and Deep Learning Models for Big Data Issues

Classification and prediction of diabetes disease using machine learning paradigm

Agro Advisory System Using Big Data Analytics

Bitcoin Price Prediction and Analysis Using Deep Learning Models

Diabetes Prediction Using Machine Learning Techniques: A Comparative Analysis

Machine Learning Models and Algorithms for Big Data Classification T

Big Data Analytics and Preprocessing

Data Analytics in Railway Operations: Using Machine Learning to Predict Train Delays

Big Data und Analytics

Big Data Analytics for Cyber-Physical Systems