An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional

PDF / 286,805 Bytes
25 Pages / 595.276 x 841.89 pts (A4) Page_size
107 Downloads / 212 Views

An Eﬃcient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional Covariates∗ WANG Lei · SUN Siying · XIA Zheng

DOI: 10.1007/s11424-020-9133-9 Received: 21 April 2019 / Revised: 27 August 2019 c The Editorial Oﬃce of JSSC & Springer-Verlag GmbH Germany 2020 Abstract Empirical-likelihood-based inference for parameters deﬁned by the general estimating equations of Qin and Lawless (1994) remains an active research topic. When the response is missing at random (MAR) and the dimension of covariate is not low, the authors propose a two-stage estimation procedure by using the dimension-reduced kernel estimators in conjunction with an unbiased estimating function based on augmented inverse probability weighting and multiple imputation (AIPW-MI) methods. The authors show that the resulting estimator achieves consistency and asymptotic normality. In addition, the corresponding empirical likelihood ratio statistics asymptotically follow central chi-square distributions when evaluated at the true parameter. The ﬁnite-sample performance of the proposed estimator is studied through simulation, and an application to HIV-CD4 data set is also presented. Keywords Consistency and asymptotic normality, dimension reduction, kernel-assisted, missing at random, multiple imputation.

1

Introduction

Consider statistical inference on a p-dimensional parameter vector θ 0 ∈ Θ deﬁned to be the unique solution to an s-dimensional estimating equation E{g(X, Y, θ)} = 0,

θ ∈ Θ,

(1)

where Θ is the parameter space, Y is a real-valued outcome or response, X is a d-dimensional covariate vector, and g is a known s-dimensional continuously diﬀerentiable function with s ≥ p. WANG Lei · SUN Siying · XIA Zheng School of Statistics and Data Science, LPMC & KLMDASR, Nankai University, Tianjin 300071, China. Email: [email protected]. ∗ This paper was supported by the National Natural Science Foundation of China under Grant Nos. 11871287, 11501208, 11771144, 11801359, the Natural Science Foundation of Tianjin under Grant No. 18JCYBJC41100, Fundamental Research Funds for the Central Universities and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin. The first two authors contributed equally to this work. This paper was recommended for publication by Editor SHAO Jun.

2

WANG LEI · SUN SIYING · XIA ZHENG

The choice of g(X, Y, θ) is ﬂexible and accommodates a wide range of scenarios, see [1–4]. On the other hand, empirical likelihood is a broadly applicable platform for constructing conﬁdence regions for the parameters deﬁned by (1)[5–7] . Unlike the conﬁdence regions constructed via normal approximation, the EL conﬁdence regions are transformation invariant, are range respecting, have a data-driven shape, and are free of the burden of estimating scaling parameters[5] . In survey sampling, social science, epidemiology studies and many other statistical problems, Y often has missing values. Let δ be the response status indicator for Y , where δ = 1 if Y is observed

Data Loading...

An Efficient Multiple Imputation Approach for Estimating Equations with Response Missing at Random and High-Dimensional

Recommend Documents

Imputation and low-rank estimation with Missing Not At Random data

SICE: an improved missing data imputation technique

Multiple Imputation

Missing Value Imputation Approach Using Cosine Similarity Measure

Ensemble Learning for Heterogeneous Missing Data Imputation

Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

Missing Value Imputation with MERCS: A Faster Alternative to MissForest

MIDIA: exploring denoising autoencoders for missing data imputation

CLT for integrated square error of density estimators with censoring indicators missing at random

Nonparametric quantile regression estimation for functional data with responses missing at random

An effective optimization-based parameterized interval analysis approach for static structural response with multiple un

Sensitivity Analysis of Missing Data: Case Studies Using Model-Based Multiple Imputation