Cluster analysis application to identify groups of individuals with high health expenditures

PDF / 1,083,453 Bytes
43 Pages / 439.37 x 666.142 pts Page_size
36 Downloads / 210 Views

Cluster analysis application to identify groups of individuals with high health expenditures Joshua Agterberg1 · Fanghao Zhong2 · Richard Crabb3 · Marjorie Rosenberg3 Received: 13 August 2019 / Revised: 23 June 2020 / Accepted: 17 July 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We compare and demonstrate the effectiveness of two clustering methods with the main purpose of identifying characteristic profiles of high utilizers of health care. In this work, we use three sets of mutually independent longitudinal data that are nationally representative of the US adult working-age civilian non-institutionalized population. We compare k-means, a commonly used clustering method, with a k-medoids algorithm called Partitioning Around Medoids. We use one cohort of data to create clusters based on similar characteristics of individuals for both clustering methods. We examine these characteristic compositions of the highest three average total expenditure clusters from this cohort. We also examine the health expenditure distributions for this cohort over the following two years. We validate the approach by applying the centers of the clusters to two other cohorts of similar data. We form clusters based on demographic, economic, and health-related characteristics that are commonly used in studies of health care utilization. We demonstrate the consistency of our results across the three cohorts of data and across different types of health expenditures, such as office-based/outpatient and drug. Clusters can be formed with other more homogeneous data, such as Medicaid, Medicare, employer sponsored insurance, or individual private plans issued under the Affordable Care Act. This approach can be used to follow similar groups over time for other types of health outcomes. Keywords Unsupervised machine learning · Goodall similarities · k-Means · Partitioning around medoids · Predicting rare events

* Joshua Agterberg [email protected] * Marjorie Rosenberg [email protected] 1

Johns Hopkins University, Whitehead Hall, 3400 N Charles St, Baltimore, MD 21218, USA

2

New York University, New York, USA

3

University of Wisconsin-Madison, Madison, USA

13

Vol.:(0123456789)

Health Services and Outcomes Research Methodology

1 Introduction Since the 1980s, researchers have been studying those who are high cost utilizers of health care (Zook and Moore 1980; Wammes et al. 2018). Today it is estimated that 1% of individuals consume 20% of the total health care expenditures, while 5% of the individuals consume 50% of the total expenditures (Mitchell 2016; Long et al. 2017). In general, identification of high utilizers of health care has involved supervised learning methods, where the outcome is known (such as Crawford et al. 2005; Fleishman and Cohen 2010; Charlson et al. 2014; Shenas et al. 2014; Wherry et al. 2014; Boscardin et al. 2015; Hamad et al. 2015; Bayliss et al. 2016; Peltz et al. 2016; Lee et al. 2017; Wammes et al. 2018; Kim and Rosenberg 2020). In this paper we p

Data Loading...

Cluster analysis application to identify groups of individuals with high health expenditures

Recommend Documents

Health Care Expenditures

Expenditures On Health Care

AMTICS: Aligning Micro-clusters to Identify Cluster Structures

Typologies of individuals vulnerable to insomnia: a two-step cluster analysis

Demography of Population Health, Aging and Health Expenditures

Behavior of Individuals, Groups, and Networks

Health Promotion for Individuals with Disabilities

Embedded Cluster Model: Application to Molecular Crystals

Financial Burdens of Out-of-Pocket Prescription Drug Expenditures under High-Deductible Health Plans

Application of Human Adenovirus Genotyping by Phylogenetic Analysis in an Outbreak to Identify Nosocomial Infection

Application of intraoral scanner to identify monozygotic twins

Application of Latent Class Analysis to Identify Subgroups of Children with Autism Spectrum Disorders who Benefit from S