Bayesian high-dimensional semi-parametric inference beyond sub-Gaussian errors

  • PDF / 1,663,686 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 14 Downloads / 244 Views

DOWNLOAD

REPORT


Journal of the Korean Statistical Society https://doi.org/10.1007/s42952-020-00091-4 RESEARCH ARTICLE

Bayesian high‑dimensional semi‑parametric inference beyond sub‑Gaussian errors Kyoungjae Lee1   · Minwoo Chae2 · Lizhen Lin3 Received: 2 March 2020 / Accepted: 30 October 2020 © Korean Statistical Society 2020

Abstract We consider a sparse linear regression model with unknown symmetric error under the high-dimensional setting. The true error distribution is assumed to belong to the locally 𝛽-Hölder class with an exponentially decreasing tail, which does not need to be sub-Gaussian. We obtain posterior convergence rates of the regression coefficient and the error density, which are nearly optimal and adaptive to the unknown sparsity level. Furthermore, we derive the semi-parametric Bernstein-von Mises (BvM) theorem to characterize asymptotic shape of the marginal posterior for regression coefficients. Under the sub-Gaussianity assumption on the true score function, strong model selection consistency for regression coefficients are also obtained, which eventually asserts the frequentist’s validity of credible sets. Keywords  High-dimensional semi-parametric model · Posterior convergence rate · Bernstein-von Mises theorem · Strong model selection consistency

1 Introduction We consider the linear regression model (1)

Y = X𝜃 + 𝜖,

Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s4295​ 2-020-00091​-4) contains supplementary material, which is available to authorized users. * Kyoungjae Lee [email protected] 1

Department of Statistics, Inha University, Incheon, South Korea

2

Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, South Korea

3

Department of Applied and Computational Mathematics and Statistics, The University of Notre Dame, Notre Dame, USA



13

Vol.:(0123456789)



Journal of the Korean Statistical Society

where Y = (Y1 , … , Yn )T ∈ ℝn is a vector of response variables, X = (xij ) ∈ ℝn×p is the n × p matrix of covariates whose i-th row is xiT = (xi1 , … , xip ) , 𝜃 ∈ ℝp is the p-dimensional regression coefficient and 𝜖 = (𝜖1 , … , 𝜖n ) ∈ ℝn is the vector of rani.i.d. dom errors with 𝜖i ∼ 𝜂 for i = 1, … , n . Statistical inference with the model (1) in high-dimensional settings has received increasing attention in recent years. For the estimability of 𝜃 under large p, certain sparsity condition is often imposed which assumes most components of 𝜃 are nearly zero. Under the sparsity assumption, regularization methods have been at the center of statistical research due to their computational tractability, ease of interpretation, elegant theory and good performance in practice. Some pioneering references include Tibshirani (1996), Fan and Li (2001), Tibshirani et al. (2005), Zou and Hastie (2005), Zou (2006), Candes and Tao (2007) and Zhang and Zhang (2014). We also refer to the monograph Bühlmann and van de Geer (2011) for reviews with abundant examples. In a Bayesian framework, the sparsity can be