Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results
- PDF / 1,201,853 Bytes
- 9 Pages / 595.276 x 790.866 pts Page_size
- 96 Downloads / 159 Views
(2020) 20:281
RESEARCH ARTICLE
Open Access
Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results Huaqing Zhao1, Samuel Tanner2, Sherita H. Golden3, Susan G. Fisher1 and Daniel J. Rubin4*
Abstract Background: There is little consensus on how to sample hospitalizations and analyze multiple variables to model readmission risk. The purpose of this study was to compare readmission rates and the accuracy of predictive models based on different sampling and multivariable modeling approaches. Methods: We conducted a retrospective cohort study of 17,284 adult diabetes patients with 44,203 discharges from an urban academic medical center between 1/1/2004 and 12/31/2012. Models for all-cause 30-day readmission were developed by four strategies: logistic regression using the first discharge per patient (LR-first), logistic regression using all discharges (LR-all), generalized estimating equations (GEE) using all discharges, and cluster-weighted (CWGEE) using all discharges. Multiple sets of models were developed and internally validated across a range of sample sizes. Results: The readmission rate was 10.2% among first discharges and 20.3% among all discharges, revealing that sampling only first discharges underestimates a population’s readmission rate. Number of discharges was highly correlated with number of readmissions (r = 0.87, P < 0.001). Accounting for clustering with GEE and CWGEE yielded more conservative estimates of model performance than LR-all. LR-first produced falsely optimistic Brier scores. Model performance was unstable below samples of 6000–8000 discharges and stable in larger samples. GEE and CWGEE performed better in larger samples than in smaller samples. Conclusions: Hospital readmission risk models should be based on all discharges as opposed to just the first discharge per patient and utilize methods that account for clustered data. Keywords: Logistic models, Patient readmission, Predictive modeling, Sampling strategies, Clustering
* Correspondence: [email protected] 4 Lewis Katz School of Medicine at Temple University, Section of Endocrinology, Diabetes, and Metabolism, 3322 N. Broad ST., Ste 205, Philadelphia, PA 19140, USA Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obt
Data Loading...