Cluster analysis application to identify groups of individuals with high health expenditures
- PDF / 1,083,453 Bytes
- 43 Pages / 439.37 x 666.142 pts Page_size
- 36 Downloads / 186 Views
Cluster analysis application to identify groups of individuals with high health expenditures Joshua Agterberg1 · Fanghao Zhong2 · Richard Crabb3 · Marjorie Rosenberg3 Received: 13 August 2019 / Revised: 23 June 2020 / Accepted: 17 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract We compare and demonstrate the effectiveness of two clustering methods with the main purpose of identifying characteristic profiles of high utilizers of health care. In this work, we use three sets of mutually independent longitudinal data that are nationally representative of the US adult working-age civilian non-institutionalized population. We compare k-means, a commonly used clustering method, with a k-medoids algorithm called Partitioning Around Medoids. We use one cohort of data to create clusters based on similar characteristics of individuals for both clustering methods. We examine these characteristic compositions of the highest three average total expenditure clusters from this cohort. We also examine the health expenditure distributions for this cohort over the following two years. We validate the approach by applying the centers of the clusters to two other cohorts of similar data. We form clusters based on demographic, economic, and health-related characteristics that are commonly used in studies of health care utilization. We demonstrate the consistency of our results across the three cohorts of data and across different types of health expenditures, such as office-based/outpatient and drug. Clusters can be formed with other more homogeneous data, such as Medicaid, Medicare, employer sponsored insurance, or individual private plans issued under the Affordable Care Act. This approach can be used to follow similar groups over time for other types of health outcomes. Keywords Unsupervised machine learning · Goodall similarities · k-Means · Partitioning around medoids · Predicting rare events
* Joshua Agterberg [email protected] * Marjorie Rosenberg [email protected] 1
Johns Hopkins University, Whitehead Hall, 3400 N Charles St, Baltimore, MD 21218, USA
2
New York University, New York, USA
3
University of Wisconsin-Madison, Madison, USA
13
Vol.:(0123456789)
Health Services and Outcomes Research Methodology
1 Introduction Since the 1980s, researchers have been studying those who are high cost utilizers of health care (Zook and Moore 1980; Wammes et al. 2018). Today it is estimated that 1% of individuals consume 20% of the total health care expenditures, while 5% of the individuals consume 50% of the total expenditures (Mitchell 2016; Long et al. 2017). In general, identification of high utilizers of health care has involved supervised learning methods, where the outcome is known (such as Crawford et al. 2005; Fleishman and Cohen 2010; Charlson et al. 2014; Shenas et al. 2014; Wherry et al. 2014; Boscardin et al. 2015; Hamad et al. 2015; Bayliss et al. 2016; Peltz et al. 2016; Lee et al. 2017; Wammes et al. 2018; Kim and Rosenberg 2020). In this paper we p
Data Loading...