A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covar

  • PDF / 781,380 Bytes
  • 27 Pages / 439.37 x 666.142 pts Page_size
  • 16 Downloads / 186 Views

DOWNLOAD

REPORT


A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covariates Hu Yang1 · Ning Li1 · Jing Yang2

Received: 19 December 2016 / Revised: 16 April 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract In this paper, a new robust and efficient estimation approach based on local modal regression is proposed for partially linear models with large-dimensional covariates. We show that the resulting estimators for both parametric and nonparametric components are more efficient in the presence of outliers or heavy-tail error distribution, and as asymptotically efficient as the corresponding least squares estimators when there are no outliers and the error distribution is normal. We also establish the asymptotic properties √  of proposed estimators when the covariate dimension diverges at the rate of o n . To achieve sparsity and enhance interpretability, we develop a variable selection procedure based on SCAD penalty to select significant parametric covariates and show that the method enjoys the oracle property under mild regularity conditions. Moreover, we propose a practical modified MEM algorithm for the proposed procedures. Some Monte Carlo simulations and a real data are conducted

This work is supported by the National Natural Science Foundation of China (Grant No. 11671059). Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00362018-1013-1) contains supplementary material, which is available to authorized users.

B

Ning Li [email protected] Hu Yang [email protected] Jing Yang [email protected]

1

College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China

2

Key Laboratory of High Performance Computing and Stochastic Information Processing (Ministry of Education of China), College of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China

123

H. Yang et al.

to illustrate the finite sample performance of the proposed estimators. Finally, based on the idea of sure independence screening procedure proposed by Fan and Lv (J R Stat Soc 70:849–911, 2008), a robust two-step approach is introduced to deal with ultra-high dimensional data. Keywords Partially linear models · Robust estimation · Variable selection · Oracle property

1 Introduction Consider the partially linear models (PLM) Y =X T β+ f (Z )+ε,

(1)

where X = (x1 , . . . , x pn )T ∈ R pn and Z = (z 1 , . . . , z q )T ∈ Rq are the covariates in the parametric and nonparametric components, β = (β1 , . . . , β pn )T is a pn -dimensional vector of unknown parameters, f (·) is an unknown smooth function, and the random error ε satisfies E (ε |X, Z ) = 0. Ever since first introduced by Engle et al. (1986), the PLM have been extensively studied in the literature. For example, see Robinson (1988), Speckman (1988), Zeger and Diggle (1994), Severini and Staniswalis (1994) and Hardle et al. (2000). In practice, large amounts of variables are usually included in regression model to reduce the po