Sliced inverse median difference regression

  • PDF / 445,182 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 65 Downloads / 183 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789().,-volV)

ORIGINAL PAPER

Sliced inverse median difference regression Stephen Babos1 • Andreas Artemiou1 Accepted: 19 January 2020  The Author(s) 2020

Abstract In this paper we propose a sufficient dimension reduction algorithm based on the difference of inverse medians. The classic methodology based on inverse means in each slice was recently extended, by using inverse medians, to robustify existing methodology at the presence of outliers. Our effort is focused on using differences between inverse medians in pairs of slices. We demonstrate that our method outperforms existing methods at the presence of outliers. We also propose a second algorithm which is not affected by the ordering of slices when the response variable is categorical with no underlying ordering of its values. Keywords Sufficient dimension reduction  Robust  Conditional independence  Categorical responses

1 Introduction Sufficient Dimension Reduction (SDR) is a class of dimension reduction techniques used in regression to address the high dimensionality of a predictor vector X 2 Rp when a response variable Y (assumed univariate without loss of generality) is regressed on X. In other words, in SDR we are trying to estimate a p  d ðd\pÞ matrix b such that Y

X|β T X

ð1Þ

The space spanned by the columns of b is called a Dimension Reduction Subspace (DRS). There are many different b’s that satisfy (1) and the main objective is to estimate the one with the minimum dimension d. The minimum DRS is known as the Central Dimension Reduction Space (CDRS) or simply the Central Space (CS) and is denoted with S YjX . There are some mild conditions of existence of the CS & Andreas Artemiou [email protected] 1

School of Mathematics, Cardiff University, Senghennydd Road, Cardiff CF24 4AG, Wales, UK

123

S. Babos, A. Artemiou

which we assume that they hold in this paper (see Yin et al. 2008). A number of methods have been proposed in the SDR literature. The most well known class of methods that has been developed and is being used most frequently is probably the class of methods based on inverse moments—see Li (1991), Cook and Weisberg (1991), Li and Wang (2007) among others. A comprehensive review of this methodology can be found in Li (2018). There are two main drawbacks in the inverse-moment based methodology like Sliced Inverse Regression (SIR) which was introduced by Li (1991). First is the dependence on moments which suffer on the presence of outliers. To address this, Gather et al. (2001), Dong et al. (2015) and Christou (2018) suggested using inverse medians instead of means. Second is the dependence on the number of slices the range of Y is discretized into, especially in methods like Sliced Average Variance Estimation (SAVE—Cook and Weisberg 1991) where the second inverse moment is used as well. Zhu et al. (2010) suggested a cumulative slicing approach (Cumulative Mean Estimation—CUME) to avoid using the number of slices as a tuning parameters. Artemiou and Tian (2015) identified that the solution given