Robust doubly protected estimators for quantiles with missing data

  • PDF / 509,129 Bytes
  • 25 Pages / 439.37 x 666.142 pts Page_size
  • 37 Downloads / 209 Views

DOWNLOAD

REPORT


Robust doubly protected estimators for quantiles with missing data Mariela Sued1 · Marina Valdora2

· Víctor Yohai1

Received: 27 July 2018 / Accepted: 10 November 2019 © Sociedad de Estadística e Investigación Operativa 2019

Abstract Doubly protected methods are widely used for estimating the population mean of an outcome Y from a sample where the response is missing in some individuals. To compensate for the missing responses, a vector X of covariates is observed at each individual, and the missing mechanism is assumed to be independent of the response, conditioned on X (missing at random). In recent years, many authors have turned from the estimation of the mean to that of the median, and more generally, doubly protected estimators of the quantiles have been proposed. In this work, we present doubly protected estimators for the quantiles in semiparametric models that are also robust, in the sense that they are resistant to the presence of outliers in the sample. Keywords Robust estimators · Missing data · Median · Quantiles · Propensity score · Doubly protected estimators

This research was partially supported by Grant pict 2014-0351 from anpcyt and Grants 20020150200110BA and 20020130100279BA from the Universidad de Buenos Aires at Buenos Aires, Argentina. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11749019-00689-9) contains supplementary material, which is available to authorized users.

B

Marina Valdora [email protected]; [email protected] Mariela Sued [email protected] Víctor Yohai [email protected]

1

Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires and CONICET, Intendente Guiraldes 2160, Ciudad Universitaria, Pabellón II - 2do. piso, C1428EGA Buenos Aires, Argentina

2

Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Intendente Guiraldes 2160, Ciudad Universitaria, Pabellón II - 2do. piso, C1428EGA Buenos Aires, Argentina

123

M. Sued et al.

Mathematics Subject Classification Primary 62F35 · Secondary 62F12

1 Introduction Missing values occur in many situations when dealing with data sets from different fields and have attracted the attention of the statistical community during the last decades. In particular, the estimation of the quantiles of an outcome Y from an incomplete data set under the missing at random (MAR) assumption has recently been considered by many authors; see, for instance, Bianco et al. (2018) , Díaz (2017) and Zhang et al. (2012). MAR establishes that the variable of interest Y and the response indicator A are conditionally independent given an always observed vector X ∈ R p of covariates; see Rubin (1976). Causal inference is an area where missing data inevitably occur because counterfactual variables can never be observed simultaneously. The average treatment effect and the effect of a treatment on the quantiles are defined in terms of the means and quantiles of counterfactual variables, and thus, the estimation of the mean and of the