A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstr

PDF / 1,226,298 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
6 Downloads / 215 Views

ORIGINAL ARTICLE

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea Farahnaz Hajipour 1

&

Mohammad Jafari Jozani 2 & Zahra Moussavi 1,3

Received: 26 October 2019 / Accepted: 23 May 2020 # International Federation for Medical and Biological Engineering 2020

Abstract A major challenge in big and high-dimensional data analysis is related to the classification and prediction of the variables of interest by characterizing the relationships between the characteristic factors and predictors. This study aims to assess the utility of two important machine-learning techniques to classify subjects with obstructive sleep apnea (OSA) using their daytime tracheal breathing sounds. We evaluate and compare the performance of the random forest (RF) and regularized logistic regression (LR) as feature selection tools and classification approaches for wakefulness OSA screening. Results show that the RF, which is a lowvariance committee-based approach, outperforms the regularized LR in terms of blind-testing accuracy, specificity, and sensitivity with 3.5%, 2.4%, and 3.7% improvement, respectively. However, the regularized LR was found to be faster than the RF and resulted in a more parsimonious model. Consequently, both the RF and regularized LR feature reduction and classification approaches are qualified to be applied for the daytime OSA screening studies, depending on the nature of data and applications’ purposes.

Keywords Feature selection . Classification . Regularized logistic regression . LASSO . Random forest . Obstructive sleep apnea Abbreviations AHI Apnea-Hypopnea Index ANOVA Analysis of variance AUC Area under the curve CI Confidence interval LASSO Least absolute shrinkage and selection operator LR Logistic regression MANOVA Multivariate analysis of variance NC Neck circumference OSA Obstructive sleep apnea OOB Out-of-bag PSD Power spectrum density PSG Polysomnography

* Farahnaz Hajipour [email protected] 1

Biomedical Engineering Program, University of Manitoba, Winnipeg, Canada

2

Department of Statistics, University of Manitoba, Winnipeg, Canada

3

Electrical and Computer Engineering Department, University of Manitoba, Winnipeg, Canada

ROC RF TBS

Receiver operating characteristics Random forest Tracheal breathing sounds

1 Introduction Nowadays, the world is in the “Big Data” era, as most available data are stored [1]. For example, in medical fields, the stored data includes patients’ personal, family, and demographic information; history of their diseases; and their various medical tests. Big Data analysis requires a fair knowledge of the data being processed and proper use of intelligent algorithms to extract appropriate knowledge from the data regarding the relationships between predictors and variables of interest, and perform classification and prediction. When dealing with large and high-dimensional datasets, it is possible to extract a considerable number of features from data. To build parsimonious models tha

Data Loading...

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstr

Recommend Documents

Logistic Regression A Self-Learning Text

Assessment and comparison of combined bivariate and AHP models with logistic regression for landslide susceptibility map

Logistic Regression A Self-Learning Text

Forecasting of Real GDP Growth Using Machine Learning Models: Gradient Boosting and Random Forest Approach

Logistic Regression A Self-Learning Text

Logistic Regression for the Diagnosis of Cervical Cancer

Public perceptions of police behavior during traffic stops: logistic regression and machine learning approaches compared

Timeliness online regularized extreme learning machine

Weighted Random Regression Models and Dropouts

Bilinear Models for Machine Learning

Regression Models for the Comparison of Measurement Methods

Binary Logistic Regression