An Easy-to-Implement Hierarchical Standardization for Variable Selection Under Strong Heredity Constraint

  • PDF / 1,132,871 Bytes
  • 32 Pages / 439.37 x 666.142 pts Page_size
  • 1 Downloads / 145 Views

DOWNLOAD

REPORT


An Easy‑to‑Implement Hierarchical Standardization for Variable Selection Under Strong Heredity Constraint Kedong Chen1 · William Li2 · Sijian Wang3

© Grace Scientific Publishing 2020

Abstract For many practical problems, the regression models follow the strong heredity property (also known as the marginality), which means they include parent main effects when a second-order effect is present. Existing methods rely mostly on special penalty functions or algorithms to enforce the strong heredity in variable selection. We propose a novel hierarchical standardization procedure to maintain strong heredity in variable selection. Our method is effortless to implement and is applicable to any variable selection method for any type of regression. The performance of the hierarchical standardization is comparable to that of the regular standardization. We also provide robustness checks and real data analysis to illustrate the merits of our method. Keywords  Hierarchical standardization · Hierarchical structure · Heredity · Variable selection · Marginality

Part of special issue guest edited by Pritam Ranjan and Min Yang. * William Li [email protected] Kedong Chen [email protected] Sijian Wang [email protected] 1

Department of Information Technology and Decision Sciences, Old Dominion University, Norfolk, USA

2

Shanghai Advanced Institute of Finance, Shanghai Jiao Tong University, Shanghai, China

3

Department of Statistics and Biostatistics, Rutgers University, Piscataway, USA



13

Vol.:(0123456789)

38  

Page 2 of 32

Journal of Statistical Theory and Practice

(2020) 14:38

1 Introduction Variable selection is important when data contain a large number of predictors. We often want to determine a smaller subset that exhibits the strongest effects. Numerous methods of variable selection have been proposed, such as the best subset selection, stepwise regression, and penalized regression including  the lasso [26], the smoothly clipped absolute deviations (SCAD) [8], the minimax concave penalty (MCP) [31], and others. In this paper, we consider the variable selection problem in the following two linear regression models:

Y =𝜂+

p ∑

(1)

𝛼j Xj + 𝜀,

j=1

Y =𝜂+

p ∑ j=1

𝛼j Xj +

p ∑ j=1

𝛽j Xj2 +



1≤j