Statistical data integration in survey sampling: a review

PDF / 1,496,594 Bytes
26 Pages / 439.37 x 666.142 pts Page_size
43 Downloads / 321 Views

Theory and Practice of Surveys

Statistical data integration in survey sampling: a review Shu Yang1 · Jae Kwang Kim2 Received: 6 January 2020 / Accepted: 13 September 2020 © Japanese Federation of Statistical Science Associations 2020

Abstract Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a wide range of integration methods such as generalized least squares, calibration weighting, inverse probability weighting, mass imputation, and doubly robust methods. Finally, we highlight important questions for future research. Keywords Generalizability · Meta-analysis · Missing at random · Transportability

1 Introduction Probability sampling is regarded as the gold standard in survey statistics for finite population inference. Fundamentally, probability samples are selected under known sampling designs and, therefore, are representative of the target population. Because the selection probability is known, the subsequent inference from a probability sample is often design-based and respects the way in which the data were collected; see Särndal et al. (2003), Cochran (1977) and Fuller (2009) for textbook discussions. Kalton (2019) provided a comprehensive overview of the survey sampling research in the last 60 years. However, many practical challenges arise in collecting and analyzing probability sample data (Baker et al. 2013; Keiding and Louis 2016). Large-scale survey * Jae Kwang Kim [email protected] 1

Department of Statistics, North Carolina State University, Raleigh, USA

2

Department of Statistics, Iowa State University, Ames, USA

13

Vol.:(0123456789)

Japanese Journal of Statistics and Data Science

programs continually face heightened demands coupled with reduced resources. Demands include requests for estimates for domains with small sample sizes and desires for more timely estimates. Simultaneously, program budget cuts force reductions in sample sizes, and decreasing response rates make non-response bias an important concern. Data integration is a new area of research to provide a timely solution to the above challenges. The goal is multi-fold: (1) minimize the cost associated with surveys, (2) minimize the respondent burden, and (3) maximize the statistical information or equivalently the efficiency of survey estimation. Narrowly speaking, survey integration means combining separate probability samples into one survey instrument (Bycroft 2010). Broadly speakin

Data Loading...

Statistical data integration in survey sampling: a review

Recommend Documents

Design-Unbiased Statistical Learning in Survey Sampling

Source Data Verification by Statistical Sampling: Issues in Implementation

Multinomial Logistic Mixed Models for Clustered Categorical Data in a Complex Survey Sampling Setup

Sampling Techniques for Statistical Databases

Statistical Review and Evaluation of In Vitro Mutagenicity Study Data

Data Sampling

Gridded population survey sampling: a systematic scoping review of the field and strategic research agenda

Data Integration

Data Integration

Data Integration in Web Data Extraction System

A measurement error model approach to survey data integration: combining information from two surveys

Handling Missing Data in Ranked Set Sampling