High-dimensional sign-constrained feature selection and grouping
- PDF / 4,482,819 Bytes
- 33 Pages / 439.37 x 666.142 pts Page_size
- 88 Downloads / 200 Views
High‑dimensional sign‑constrained feature selection and grouping Shanshan Qin1 · Hao Ding1 · Yuehua Wu1 · Feng Liu2 Received: 10 January 2020 / Revised: 4 September 2020 / Accepted: 8 September 2020 © The Institute of Statistical Mathematics, Tokyo 2020
Abstract In this paper, we propose a non-negative feature selection/feature grouping (nnFSG) method for general sign-constrained high-dimensional regression problems that allows regression coefficients to be disjointly homogeneous, with sparsity as a special case. To solve the resulting non-convex optimization problem, we provide an algorithm that incorporates the difference of convex programming, augmented Lagrange and coordinate descent methods. Furthermore, we show that the aforementioned nnFSG method recovers the oracle estimate consistently, and that the mean-squared errors are bounded. Additionally, we examine the performance of our method using finite sample simulations and applying it to a real protein mass spectrum dataset. Keywords Difference convex programming · Feature grouping · Feature selection · High-dimensional · Non-negative
* Hao Ding [email protected] Shanshan Qin [email protected] Yuehua Wu [email protected] Feng Liu [email protected] 1
Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada
2
Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
13
Vol.:(0123456789)
S. Qin et al.
1 Introduction In recent decades, high-dimensional problems appear in many fields due to the increasing prevalence of big data. A classical model for data analysis is the linear regression model,
yi = x⊤i 𝜷 + 𝜖i ( i = 1, … , n),
(1)
where yi are response observations, xi = (xi1 , … , xip )⊤ are p-dimensional vectors of predictors, 𝜷 ∈ ℝp is a vector of unknown regression coefficients, 𝜖i are random errors, and xi are independent of 𝜖i . Regression analysis aims at identifying the relevant explanatory variables of the response and achieving high prediction accuracy (Rekabdarkolaee et al. 2017). In the high-dimensional setting, p is at least of the same order of magnitude as n, say p = O(n) (p is not fixed), or p >> n , in which case 𝜷 is usually assumed to be sparse, i.e., only a small set of elements are nonzero (Slawski and Hein 2013). For high-dimensional regression problems, regularization methods are of critical importance in a broad sense, and much work has been devoted to exploring sparseness of regression vectors. Examples include Bridge regression (Frank and Friedman 1993), Lasso (Tibshirani 1996), SCAD (Fan and Li 2001), elastic net (Zou and Hastie 2005), adaptive Lasso (Zou 2006), and MCP (Zhang 2010). Moreover, extracting one kind of lower-dimensional structure defined by groups has received increasing attention. One can turn to Tibshirani et al. (2005), Yuan and Lin (2006), Huang et al. (2009), She (2010), Jang et al. (2011), Tibshirani and Taylor (2011), Shen et al. (2012a), Yang et al. (2012), Zhu et al. (2013), Xiang et al. (2015),
Data Loading...