Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis

PDF / 443,190 Bytes
24 Pages / 439.37 x 666.142 pts Page_size
7 Downloads / 229 Views

Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis Liya Fu1 · Zhuoran Yang1 · Fengjing Cai2 · You-Gan Wang3 Received: 1 February 2020 / Accepted: 1 October 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract New technologies have produced increasingly complex and massive datasets, such as next generation sequencing and microarray data in biology, dynamic treatment regimes in clinical trials and long-term wide-scale studies in the social sciences. Each study exhibits its unique data structure within individuals, clusters and possibly across time and space. In order to draw valid conclusion from such large dimensional data, we must account for intracluster correlations, varying cluster sizes, and outliers in response and/or covariate domains to achieve valid and efficient inferences. A weighted rank-based method is proposed for selecting variables and estimating parameters simultaneously. The main contribution of the proposed method is four fold: (1) variable selection using adaptive lasso is extended to robust rank regression so that protection against outliers in both response and predictor variables is obtained; (2) within-subject correlations are incorporated so that efficiency of parameter estimation is improved; (3) the computation is convenient via the existing function in statistical software R. (4) the proposed method is proved to have desirable asymptotic properties for fixed number of covariates ( p). Simulation studies are carried out to evaluate the proposed method for a number of scenarios including the cases when p equals to the number of subjects. The simulation results indicate that the proposed method is efficient and robust. A hormone dataset is analyzed for illustration. By adding additional redundant variables as covariates, the penalty approach and weighting schemes are proven to be effective. Keywords Correlated data · Outliers · Rank-based method · Variable selection

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00180020-01038-3) contains supplementary material, which is available to authorized users. Extended author information available on the last page of the article

123

L. Fu et al.

1 Introduction Longitudinal data is commonly utilized in economics, medical studies, and environmental research. A large number of covariates are often collected in longitudinal studies. The inclusion of redundant variables can reduce the accuracy and efficiency of parameter estimation. Therefore, it is important to select the appropriate covariates in analyzing longitudinal data. However, it is a challenge to select significant variables in longitudinal data due to underlying correlations and unavailable likelihood. Fan and Li (2004) provided a penalized weighted least-squares approach for variable selection in a semiparametric model in longitudinal data analysis. Ni et al. (2010) proposed a double-penalized Gaussian likelihood approach for simultaneous model selection an

Data Loading...

Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis

Recommend Documents

Variable selection for generalized partially linear models with longitudinal data

Variable Selection and Estimation in Kink Regression Model

Variable Selection for Time-to-Event Data

Parameter Selection Methods in Inverse Problem Formulation

Variable selection for linear regression in large databases: exact methods

Innovative Analysis for Parameter Estimation Quality

A robust and efficient estimation and variable selection method for partially linear models with large-dimensional covar

Optimal Sensor Selection for Estimation of Distributed Parameter Systems

Longitudinal Categorical Data Analysis

Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignora

Parameter estimation and model selection for water sorption in a wood fibre material

Dynamic Systems Models New Methods of Parameter and State Estimation