Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignora
- PDF / 1,757,194 Bytes
- 25 Pages / 439.37 x 666.142 pts Page_size
- 94 Downloads / 238 Views
Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignorable dropouts Lei Wang1 · Wei Ma1 Received: 28 May 2019 / Revised: 16 April 2020 © The Institute of Statistical Mathematics, Tokyo 2020
Abstract In this paper, we propose improved statistical inference and variable selection methods for generalized linear models based on empirical likelihood approach that accommodates both the within-subject correlations and nonignorable dropouts. We first apply the generalized method of moments to estimate the parameters in the nonignorable dropout propensity based on an instrument. The inverse probability weighting is applied to obtain the bias-corrected generalized estimating equations (GEEs), and then we borrow the idea of quadratic inference function and hybrid GEE to construct the empirical likelihood procedures for longitudinal data with nonignorable dropouts, respectively. Two different classes of estimators and their confidence regions are derived. Further, the penalized EL method and algorithm for variable selection are investigated. The finite-sample performance of the proposed estimators is studied through simulation, and an application to HIV-CD4 data set is also presented. Keywords Inverse probability weighting · Missing not at random · Nonresponse instrument · Quadratic inference function · Variable selection
Electronic supplementary material The online version of this article (https://doi.org/10.1007/s1046 3-020-00761-4) contains supplementary material, which is available to authorized users. * Lei Wang [email protected] Wei Ma [email protected] 1
School of Statistics and Data Science, LPMC & KLMDASR, Nankai University, Tianjin 300071, China
13
Vol.:(0123456789)
L. Wang, W. Ma
1 Introduction In research areas such as medicine, population health, economics, social sciences and sample surveys, data are often collected from every sampled subject at many time points, which are referred to as longitudinal data. Let yi = (yi1 , yi2 , … , yimi )T be a mi dimensional vector of the ith subject’s response and xi = (xi1 , … , ximi )T be a (mi × p)-dimensional matrix of covariates associated with yi , i = 1, … , n , where mi is also called as the cluster size for the ith cluster. Assume that the first and second moments of yij are modeled by
g(𝜇ij ) = xTij 𝜷,
Var(yij ) = 𝜙v(𝜇ij ),
(1)
where 𝜷 is a p-dimensional parameter vector, g(⋅) is a known link function, 𝜇ij = E(yij ) , 𝜙 is a dispersion parameter, v(⋅) is a known variance function and aT is the transpose of a. For longitudinal data, it has been recognized that the within-cluster correlation structure plays an important role and a major aspect is how to take into account the correlation structure to improve estimation efficiency. However, since the underlying correlation structure is difficult to describe and specify, a naive and simple way is to use a working model, see You et al. (2006) and Xue and Zhu (2007) and references therein, which may lose some efficiency when strong c
Data Loading...