Variable selection for generalized partially linear models with longitudinal data

  • PDF / 1,681,545 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 54 Downloads / 197 Views

DOWNLOAD

REPORT


SPECIAL ISSUE

Variable selection for generalized partially linear models with longitudinal data Jinghua Zhang1,2 · Liugen Xue2 Received: 20 May 2020 / Revised: 21 October 2020 / Accepted: 28 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Variables selection and parameter estimation are of great significance in all regression analysis. A variety of approaches have been proposed to tackle this problem. Among those, the penalty-based shrinkage approach has been most popular for the ability to carry out the variable selection and parameter estimation simultaneously. However, not much work is available on the variable selection for the generalized partially models (GPLMs) with longitudinal data. In this paper, we proposed a variable selection procedure for GPLMs with longitudinal data. The inference is based on the SCAD-penalized quadratic inference functions, which is obtained after the B-spline approximating to non-parametric function in the model. The proposed approach efficiently utilized the within-cluster correlation information, which can improve estimating efficiency. The proposed approach also has the virtue of low computational cost. With the tuning parameter chosen by BIC, the correct model is identified with probability tends to 1. The resulted estimator of the parametric component is asymptotic to a normal distribution, and that of the non-parametric function achieves the optimal convergence rate. The performance of the proposed methods is evaluated through extensive simulation studies. A real data analysis shows that the proposed approach succeeds in excluding the insignificant variable. Keywords  Variable selection · Longitudinal data · Quadratic inference functions · Generalized partially linear models Mathematical Subject Classification  62G08 · 62F12

1 Introduction Identifying the significant variables is of great significance in all regression analysis. In practice, a number of variables are available for an initial analysis, but many of them may not be significant and should be excluded from the final model in order to increase the accuracy of prediction. However, the Under-fitted model, which excludes some significant variables, will lead to a biased estimator. Various procedures and criteria, such as subset selection and stepwise selection with Akaike information criterion * Jinghua Zhang [email protected] Liugen Xue [email protected] 1



Department of Information Engineering, Jingdezhen Ceramic Institute, Jiangxi, China



College of Applied Sciences, Beijing University of Technology, Beijing, China

2

(AIC), Mallows Cp, and Bayesian information criterion (BIC), have been developed. Unfortunately, the former suffers from expensive computational costs, and the latter lacks stability. Many shrinkage methods have been developed for the purpose of computational efficiency, e.g., the nonnegative garrotte [1], the LASSO [2], the bridge regression [3], the SCAD [4], and the one-step sparse estimator [5]. Among those, LASSO has been widely used since Elfon