Regression and subgroup detection for heterogeneous samples

  • PDF / 713,165 Bytes
  • 26 Pages / 439.37 x 666.142 pts Page_size
  • 67 Downloads / 208 Views

DOWNLOAD

REPORT


Regression and subgroup detection for heterogeneous samples Baosheng Liang1,2 · Peng Wu3 · Xingwei Tong3 · Yanping Qiu4,5 Received: 12 February 2019 / Accepted: 1 February 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Regression analysis of heterogeneous samples with subgroup structure is essential to the development of precision medicine. In practice, this task is often challenging owing to the lack of prior knowledge of subgroup labels. Therefore, detecting the subgroups with similar characteristics becomes critical, which often controls the accuracy of regression analysis. In this article, we investigate a new framework for detecting the subgroups that have similar characters in feature space and similar treatment effects. The key idea is that we incorporate K -means clustering into the regression framework of concave pairwise fusion, so that the regression and subgroup detection tasks can be performed simultaneously. Our method is specifically tailored for handling the situations where the sample is not homogeneous in the sense that the response variables in different domains of feature space are generated through different mechanisms. Keywords Concave fusion · Heterogeneous problem · K -means clustering · Regression · Subgroup detection

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00180020-00965-5) contains supplementary material, which is available to authorized users.

B

Yanping Qiu [email protected]

1

Department of Biostatistics, Health Science Center, Peking University, Beijing 100191, People’s Republic of China

2

Institute of Medical Technology, Peking University, Beijing, People’s Republic of China

3

School of Statistics, Beijing Normal University, Beijing 101875, People’s Republic of China

4

School of Statistics, Renmin University of China, Beijing 100872, People’s Republic of China

5

Statistics & Decision Sciences, Janssen Research & Development, Beijing 100025, People’s Republic of China

123

B. Liang et al.

1 Introduction One of the most important issues in precision medicine is the regression analysis of heterogeneous samples with subgroup structures. Clinically, patients with different characteristics in the genotype and phenotypes often show heterogeneous responses to a same treatment (Sorensen 1996), and in practice, unobserved confounders could also contribute to the heterogeneous treatment effects. To tailor a proper treatment for patients from different subgroups, it is crucial to identify the subgroup label for each patient and then prescribe the optimal treatments for this subgroup to the patient. For such a procedure, one underlying assumption is that patients from the same subgroup have analogous characters in feature space and identical treatment effects on responses. Therefore, from the perspective of regression analysis, understanding the treatment heterogeneity, figuring out the subgroup structures, and estimating the treatment effect of each subgroup are essential to the success of precision