Overcoming the Computing Barriers in Statistical Causal Inference

The massive development in statistical causal inference to the era of big data commonly seen in public health applications can be always hindered due to the computational barriers. In this chapter we discuss a practical concern on computing barriers in st

  • PDF / 206,443 Bytes
  • 13 Pages / 439.36 x 666.15 pts Page_size
  • 27 Downloads / 213 Views

DOWNLOAD

REPORT


Overcoming the Computing Barriers in Statistical Causal Inference Kai Zhang and Ding-Geng Chen

Abstract The massive development in statistical causal inference to the era of big data commonly seen in public health applications can be always hindered due to the computational barriers. In this chapter we discuss a practical concern on computing barriers in statistical causal inference with example in optimal pair matching and consequently offer a novel solution by constructing a stratification tree based on exact matching and propensity scores. We demonstrate the implementation of this novel method with a large observational study from Philadelphia obstetric unit closure from 1995 to 2003 with 59 observed covariates in each of the 132,786 birth deliveries and 5,998,111 potential controls. Algorithms and R program code are also provided for interested readers.

1 Statistical Causal Inference and Optimal Pair Matching In standard statistical modelling, such as the typical regression, estimation, and hypothesis testing techniques, we estimate parameters of a statistical distribution from samples drawn of that distribution. With the estimated parameters for this distribution, we can then make statistical inferences for the associations among variables as well as estimate the probabilities of past and future events with new evidence or new measurements. The processes in statistical modelling can be legitimized and substantiated at the same experimental conditions which are static in the process of statistical design and data collection. This static experimental condition is always a debating topic in the standard statistical modelling.

K. Zhang () Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA e-mail: [email protected] D.-G. Chen School of Social Work & Department of Biostatistics, Gilling School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA e-mail: [email protected] © Springer International Publishing Switzerland 2016 H. He et al. (eds.), Statistical Causal Inferences and Their Applications in Public Health Research, ICSA Book Series in Statistics, DOI 10.1007/978-3-319-41259-7_7

125

126

K. Zhang and D.-G. Chen

Relaxing the static assumptions in statistical modeling, causal inference goes one step further which can infer not only the probabilities under static conditions, but also the dynamics under the changing conditions by treatments or external interventions. This distinction implies that there is a fundamental difference between causal and associational concepts. In standard statistical modelling, the estimated distribution function cannot tell us how that distribution would differ if external conditions were to change, such as from observational to experimental setup. This information change must be provided by causal assumptions which identify relationships that remain invariant when external conditions change. The fundamental problem in causal inference is often defined by the counterfactual. This counterfactual can