Regression and subgroup detection for heterogeneous samples

PDF / 713,165 Bytes
26 Pages / 439.37 x 666.142 pts Page_size
67 Downloads / 328 Views

Regression and subgroup detection for heterogeneous samples Baosheng Liang1,2 · Peng Wu3 · Xingwei Tong3 · Yanping Qiu4,5 Received: 12 February 2019 / Accepted: 1 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Regression analysis of heterogeneous samples with subgroup structure is essential to the development of precision medicine. In practice, this task is often challenging owing to the lack of prior knowledge of subgroup labels. Therefore, detecting the subgroups with similar characteristics becomes critical, which often controls the accuracy of regression analysis. In this article, we investigate a new framework for detecting the subgroups that have similar characters in feature space and similar treatment effects. The key idea is that we incorporate K -means clustering into the regression framework of concave pairwise fusion, so that the regression and subgroup detection tasks can be performed simultaneously. Our method is specifically tailored for handling the situations where the sample is not homogeneous in the sense that the response variables in different domains of feature space are generated through different mechanisms. Keywords Concave fusion · Heterogeneous problem · K -means clustering · Regression · Subgroup detection

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00180020-00965-5) contains supplementary material, which is available to authorized users.

B

Yanping Qiu [email protected]

1

Department of Biostatistics, Health Science Center, Peking University, Beijing 100191, People’s Republic of China

2

Institute of Medical Technology, Peking University, Beijing, People’s Republic of China

3

School of Statistics, Beijing Normal University, Beijing 101875, People’s Republic of China

4

School of Statistics, Renmin University of China, Beijing 100872, People’s Republic of China

5

Statistics & Decision Sciences, Janssen Research & Development, Beijing 100025, People’s Republic of China

123

B. Liang et al.

1 Introduction One of the most important issues in precision medicine is the regression analysis of heterogeneous samples with subgroup structures. Clinically, patients with different characteristics in the genotype and phenotypes often show heterogeneous responses to a same treatment (Sorensen 1996), and in practice, unobserved confounders could also contribute to the heterogeneous treatment effects. To tailor a proper treatment for patients from different subgroups, it is crucial to identify the subgroup label for each patient and then prescribe the optimal treatments for this subgroup to the patient. For such a procedure, one underlying assumption is that patients from the same subgroup have analogous characters in feature space and identical treatment effects on responses. Therefore, from the perspective of regression analysis, understanding the treatment heterogeneity, figuring out the subgroup structures, and estimating the treatment effect of each subgroup are essential to the success of precision

Data Loading...

Regression and subgroup detection for heterogeneous samples

Recommend Documents

Combining Heterogeneous Classifiers for Network Intrusion Detection

Risk Factor Considerations in Statistical Signal Detection: Using Subgroup Disproportionality to Uncover Risk Groups for

Subgroup Mining

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Development of Red Light Violation Detection System for Heterogeneous Traffic

Homogeneous Pools to Heterogeneous Ensembles for Unsupervised Outlier Detection

Subgroup Heterogeneity

Abelian Hidden Subgroup Problem

Consistency for wavelet estimator in nonparametric regression model with extended negatively dependent samples

Regression and Hierarchical Regression Models

Event Detection and Recommendation Based on Heterogeneous Information

A Method for Building Heterogeneous Ensembles of Regression Models Based on a Genetic Algorithm