An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional

  • PDF / 286,805 Bytes
  • 25 Pages / 595.276 x 841.89 pts (A4) Page_size
  • 107 Downloads / 173 Views

DOWNLOAD

REPORT


An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional Covariates∗ WANG Lei · SUN Siying · XIA Zheng

DOI: 10.1007/s11424-020-9133-9 Received: 21 April 2019 / Revised: 27 August 2019 c The Editorial Office of JSSC & Springer-Verlag GmbH Germany 2020 Abstract Empirical-likelihood-based inference for parameters defined by the general estimating equations of Qin and Lawless (1994) remains an active research topic. When the response is missing at random (MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation (AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The finite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented. Keywords Consistency and asymptotic normality, dimension reduction, kernel-assisted, missing at random, multiple imputation.

1

Introduction

Consider statistical inference on a p-dimensional parameter vector θ 0 ∈ Θ defined to be the unique solution to an s-dimensional estimating equation E{g(X, Y, θ)} = 0,

θ ∈ Θ,

(1)

where Θ is the parameter space, Y is a real-valued outcome or response, X is a d-dimensional covariate vector, and g is a known s-dimensional continuously differentiable function with s ≥ p. WANG Lei · SUN Siying · XIA Zheng School of Statistics and Data Science, LPMC & KLMDASR, Nankai University, Tianjin 300071, China. Email: [email protected]. ∗ This paper was supported by the National Natural Science Foundation of China under Grant Nos. 11871287, 11501208, 11771144, 11801359, the Natural Science Foundation of Tianjin under Grant No. 18JCYBJC41100, Fundamental Research Funds for the Central Universities and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin. The first two authors contributed equally to this work.  This paper was recommended for publication by Editor SHAO Jun.

2

WANG LEI · SUN SIYING · XIA ZHENG

The choice of g(X, Y, θ) is flexible and accommodates a wide range of scenarios, see [1–4]. On the other hand, empirical likelihood is a broadly applicable platform for constructing confidence regions for the parameters defined by (1)[5–7] . Unlike the confidence regions constructed via normal approximation, the EL confidence regions are transformation invariant, are range respecting, have a data-driven shape, and are free of the burden of estimating scaling parameters[5] . In survey sampling, social science, epidemiology studies and many other statistical problems, Y often has missing values. Let δ be the response status indicator for Y , where δ = 1 if Y is observed