Global statistical inference for the difference between two regression mean curves with covariates possibly partially mi

  • PDF / 857,628 Bytes
  • 30 Pages / 439.37 x 666.142 pts Page_size
  • 56 Downloads / 192 Views

DOWNLOAD

REPORT


Global statistical inference for the difference between two regression mean curves with covariates possibly partially missing Li Cai1 · Suojin Wang2 Received: 26 March 2020 / Revised: 15 September 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In two sample problems it is of interest to examine the difference between the two regression curves or to detect whether certain functions are adequate to describe the overall trend of the difference. In this paper, we propose a simultaneous confidence band (SCB) as a global inference method with asymptotically correct coverage probabilities for the difference curve based on the weighted local linear kernel regression estimates in each sample. Our procedure allows for random designs, different sample sizes, heteroscedastic errors, and especially missing covariates. Simulation studies are conducted to investigate the finite sample properties of the new SCB which support our asymptotic theory. The proposed SCB is used to analyze two data sets, one of which is concerned with human event-related potentials data which are fully observed and the other is concerned with the Canada 2010/2011 youth student survey data with partially missing covariates, leading to a number of discoveries. Keywords Covariates missing at random · Gaussian process · Simultaneous confidence band · Weighted local linear regression

1 Introduction The comparison of two regression functions is a fundamental problem in applied regression analysis; see, for instance (Munk and Dette 1998; Neumeyer and Sperlich 2006; González-Manteiga and Crujeiras 2013; Park et al. 2014; Pardo-Fernández et al. 2015b; Zhao et al. 2020). Consider the following two nonparametric regression models

B

Suojin Wang [email protected]

1

School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China

2

Department of Statistics, Texas A&M University, College Station, TX 77843, USA

123

L. Cai, S. Wang

in the general settings of heteroscedastic errors and different sample sizes, Yik = m k (X ik ) + εik , i = 1, 2, . . . , n k , k = 1, 2,

(1)

where X ik (i = 1, 2, . . . , n i , k = 1, 2) are independent and identically distributed (i.i.d.) covariates with the density function f k (x) on a interval [a, b], Yik are response nk are unobserved i.i.d. random errors with E (εik |X ik ) = 0, variables, and (εik )i=1   2 2 and E εik |X ik = σk (X ik ). The regression mean functions m k (x) and the variance functions σk2 (x) are generally unknown. In this paper, we are interested in examining the global shape of the unknown difference curve m 1 (x) − m 2 (x) by constructing an asymptotically accurate simultaneous confidence band (SCB) as a function of the covariate x. As a motivating example, let us consider the Canada 2010/2011 youth student survey data set that has some covariates missing. It would be interesting to see how female students’ self-esteem differs from that of male students in terms of Body Mass Index (BMI), and how White students’ self-esteem differs from that of Asian