Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes

  • PDF / 2,611,681 Bytes
  • 19 Pages / 595.276 x 790.866 pts Page_size
  • 61 Downloads / 126 Views

DOWNLOAD

REPORT


DIABETES EPIDEMIOLOGY (E SELVIN AND K FOTI, SECTION EDITORS)

Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes Sanjay Basu 1,2,3

&

Karl T. Johnson 4 & Seth A. Berkowitz 4

Accepted: 26 October 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Purpose of Review Machine learning approaches—which seek to predict outcomes or classify patient features by recognizing patterns in large datasets—are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020. Recent Findings Machine learning approaches using tree-based learners—which produce decision trees to help guide clinical interventions—frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and “deep learning” can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Summary Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions. Keywords Machine learning . Diabetes

Introduction Machine learning refers to a suite of strategies designed to predict an outcome, cluster data, or assist in decision-making,

This article is part of the Topical Collection on Diabetes Epidemiology * Sanjay Basu [email protected] 1

Center for Primary Care, Harvard Medical School, Boston, MA, USA

2

Research and Population Health, Collective Health, San Francisco, CA, USA

3

School of Public Health, Imperial College London, London SW7, UK

4

General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

typically by repeatedly resampling from datasets, honing the precision or accuracy of a learning algorithm (“learner”) through repeated model fitting and error correction [1]. Because machine learning requires iterative and repeated sampling of data, it is suited to large datasets, typically involving many covariates and a large sample size. In turn, machine learning methods have the potential