Doubly robust augmented-estimating-equations estimation with nonignorable nonresponse data

  • PDF / 568,677 Bytes
  • 30 Pages / 439.37 x 666.142 pts Page_size
  • 53 Downloads / 249 Views

DOWNLOAD

REPORT


Doubly robust augmented-estimating-equations estimation with nonignorable nonresponse data Tianqing Liu1 · Xiaohui Yuan2 Received: 10 December 2017 / Revised: 2 June 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract The problem of nonignorable nonresponse data is ubiquitous in medical and social science studies. Analyses focused only on the missing-at-random assumption may lead to biased results. Various debias methods have been extensively studied in the literature, particularly the doubly robust (DR) estimators. We propose DR augmented-estimatingequations (AEE) estimators of the mean response which enjoy the double-robustness property under correct specification of the log odds ratio model. An advantage of DR AEE estimators is that they can efficiently use the completely observed covariates to improve estimation efficiency of existing DR estimators with nonignorable nonresponse data. We propose a model selection criterion that can consistently select the correct parametric model of the log odds ratio model from a group of candidate models. Moreover, the correctness of the required working models can be evaluated via straightforward goodness-of-fit tests. Simulation results indicate that doubly robust augmented-estimating-equations estimators are very robust to a misspecification of the baseline outcome density model or the baseline response model and dominate other competitors in the sense of having smaller mean-square errors. The analysis of a real dataset illustrates the flexibility and usefulness of the proposed methods. Keywords Augmented estimating equations · Doubly robust · Goodness-of-fit tests · Non-ignorable missing data · Nonresponse instrumental variable

B B

Tianqing Liu [email protected] Xiaohui Yuan [email protected]

1

School of Mathematics, Jilin University, Changchun 130012, Jilin, China

2

School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, Jilin, China

123

T. Liu, X. Yuan

1 Introduction Let y denote the outcome of interest which may not be observed for all subjects, w = (x T , z T ) be a l-dimensional vector of covariates which is always observed. Let r be a response indicator of y, i.e., it takes 1 if y is observed, and takes 0 otherwise. In statistic literature, non-ignorable missingness (Little and Rubin 2002) is the most difficult problem, because the response probability pr(r = 1|w, y) depends on y regardless of whether y is observed or missing and the joint distribution of (w, y, r ) cannot be identifiable without further restrictions on the response probability pr(r = 1|w, y). For model identification, throughout, we assume that the fully observed nonresponse instrumental variable z (Wang et al. 2014; Miao and Tchetgen 2016; Choi and Lee 2017) satisfies z⊥  ⊥ y|x and z ⊥ ⊥ r |(y, x).

(1)

Under assumption (1), Miao and Tchetgen (2016) factorized the conditional density function of (z, y, r ) given x as f (z, y, r |x) = c(x) exp{(1 − r )OR(y|x)}pr(r |y = 0, x) f (z, y|r = 1, x),

(2)

where c(x) = pr(r = 1|x)