Model-free feature screening via distance correlation for ultrahigh dimensional survival data
- PDF / 852,755 Bytes
- 28 Pages / 439.37 x 666.142 pts Page_size
- 100 Downloads / 148 Views
Model-free feature screening via distance correlation for ultrahigh dimensional survival data Jing Zhang1 · Yanyan Liu2 · Hengjian Cui3 Received: 1 February 2020 / Revised: 22 July 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract With the explosion of ultrahigh dimensional data in various fields, many sure independent screening methods have been proposed to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan–Meier estimator to handle censoring, which may not perform well for heavy censoring cases. In this article, we propose a novel sure independent screening procedure based on distance correlation after standardizing marginal variables for ultrahigh dimensional survival data. It is a model-free approach and does not involve the Kaplan–Meier estimator, thus its performance is much more robust than the existing methods. Furthermore, our proposed method enjoys other advantages: it avoids the complication to specify an actual model from large number of covariates; it enjoys the sure screening property and the ranking consistency under some mild regularity conditions; it does not require any complicated numerical optimization, so the corresponding calculation is very simple and fast. Extensive numerical studies demonstrate that the proposed method has favorable exhibition over the existing methods. As an illustration, we apply the proposed method to a gene expression data set. Keywords Distance correlation · Model-free screening · Sure screening property · Survival data · Ultrahigh dimensional data
B
Hengjian Cui [email protected] Jing Zhang [email protected] Yanyan Liu [email protected]
1
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China
2
School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
3
School of Mathematical Sciences, Capital Normal University, Beijing 100875, China
123
Zhang et al.
1 Introduction With the rapid development of modern technology, ultrahigh dimensional data could be collected at relatively low cost and have appeared in diverse fields of scientific research. For ultrahigh dimensional data, the dimensionality p grows very rapidly with sample size n (e.g., p = exp(n α ) with α > 0), the existing regularization approaches (e.g, Tibshirani 1996; Fan and Li 2001; Candes and Tao 2007; Zhang 2010) may not perform well due to challenges of computational expediency, statistical accuracy and algorithmic stability (Fan et al. 2009). Marginal feature screening has been demonstrated to be an effective dimension reduction method for ultrahigh dimensional data and has received much attention in recent literature. The main idea is utilizing marginal information to filter out some irrelevant variables so as to reduce the dimensionality of the data. Compared to variable selection, the goal of feature screening is less ambitious as it only aims to filter out a majority of inactive variables,
Data Loading...