Empirical likelihood and estimating equations for survey data analysis
- PDF / 1,446,542 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 45 Downloads / 212 Views
Theory and Practice of Surveys
Empirical likelihood and estimating equations for survey data analysis Changbao Wu1 · Mary E. Thompson1 Received: 29 February 2020 / Accepted: 3 July 2020 © Japanese Federation of Statistical Science Associations 2020
Abstract This paper provides an overview of empirical likelihood methods for analysis of survey data when the finite population parameters are defined through a set of census estimating equations. The general inferential framework involving both the superpopulation and the finite population parameters is described, and inferential procedures for point estimation, hypothesis testing, variable selection, and Bayesian analysis, along with the main computational procedures, are discussed. Keywords Bayesian inference · Complex survey data · Estimating functions · Finite population parameters · Hypothesis testing · Regression analysis · Superpopulation models · Variable selection
1 Estimating equations and empirical likelihood Maximum-likelihood and least-squares estimation methods are two fundamental pillars of the modern statistical sciences. Suppose that (y1 , … , yn ) is an independent and identically distributed (iid) sample from a random variable Y with an assumed parametric distribution f (y;𝜃) . Under certain regularity conditions, the maximum-likeli∏ hood estimator 𝜃̂ of 𝜃 , which maximizes the likelihood function L(𝜃) = ni=1 f (yi ;𝜃) , is the solution to the score equations:
This research is supported by grants from the Natural Sciences and Engineering Research Council of Canada. * Changbao Wu [email protected] Mary E. Thompson [email protected] 1
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
13
Vol.:(0123456789)
Japanese Journal of Statistics and Data Science n
∑ 𝜕 𝜕 log L(𝜃) = log f (yi ;𝜃) = 𝟎. 𝜕𝜃 𝜕𝜃 i=1
(1)
When the response variable yi is related to a vector of covariates 𝐱i and the main objective is to explore relations between y and 𝐱 , a semiparametric regression model can be specified through the first two conditional moments E𝜉 (yi ∣ 𝐱i ) = 𝜇(𝐱i ;𝜃) and V𝜉 (yi ∣ 𝐱i ) = vi 𝜎 2 , where 𝜇(𝐱i ;𝜃) is the mean function, which can be linear or nonlinear in the vector of parameters 𝜃 , and vi are known constants which might depend on the given 𝐱i . The notations E𝜉 (⋅) and V𝜉 (⋅) refer to expectation and variance under the assumed semiparametric model, 𝜉 . The weighted least-squares estimator 𝜃̂ of 𝜃 , which minimizes the weighted sum of squares of residuals ∑n Q(𝜃) = i=1 {yi − 𝜇(𝐱i ;𝜃)}2 ∕vi , is the solution to the normal equations: n
∑ 𝜕 Q(𝜃) = −2 𝐃(𝐱i ;𝜃)v−1 {yi − 𝜇(𝐱i ;𝜃)} = 𝟎, i 𝜕𝜃 i=1
(2)
where 𝐃(𝐱i ;𝜃) = 𝜕𝜇(𝐱i ;𝜃)∕𝜕𝜃 . For linear regression models where 𝜇(𝐱i ;𝜃) = 𝐱i� 𝜃 , we have 𝐃(𝐱i ;𝜃) = 𝐱i . For generalized linear models with 𝜇i = 𝜇(𝐱i ;𝜃) = 𝜇(𝐱i� 𝜃) and vi = v(𝜇i ) , where 𝜇(⋅) is a link function and v(⋅) is a variance function, the solution to (2) is called the quasi-maximum-likelihood estimator of 𝜃 (McCullagh and Nelder 1983). The score Eq. (1) and the normal equations (
Data Loading...