Semiparametric Likelihood-based Inference for Censored Data with Auxiliary Information from External Massive Data Source

  • PDF / 181,131 Bytes
  • 15 Pages / 612 x 792 pts (letter) Page_size
  • 36 Downloads / 191 Views

DOWNLOAD

REPORT


Acta Mathemacae Applicatae Sinica, English Series The Editorial Office of AMAS & Springer-Verlag GmbH Germany 2020

Semiparametric Likelihood-based Inference for Censored Data with Auxiliary Information from External Massive Data Sources Yue-xin FANG1 , Yong ZHOU2,† 1 School

of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China

2 Key

Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE, and Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai 200062, China (E-mail: [email protected])

Abstract Published auxiliary information can be helpful in conducting statistical inference in a new study. In this paper, we synthesize the auxiliary information with semiparametric likelihood-based inference for censoring data with the total sample size is available. We express the auxiliary information as constraints on the regression coefficients and the covariate distribution, then use empirical likelihood method for general estimating equations to improve the efficiency of the interested parameters in the specified model. The consistency and asymptotic normality of the resulting regression parameter estimators established. Also numerical simulation and application with different supposed conditions show that the proposed method yields a substantial gain in efficiency of the interested parameters. Keywords

Auxiliary information; Massive data; Censored data; Empirical likelihood; Estimation equations

2000 MR Subject Classification

1

62F12

Introduction

Auxiliary information from various practical data sources of extremely large sample sizes are now increasingly available for research purposes. In particulary, such a massive data collected from the historical study, we often have auxiliary information from previous studies or public databases. How to use such auxiliary information from external sources is a popular and interesting question. It is helpful to develop techniques and strategies to use the auxiliary information for increasing efficiency in statistical inference. [2] showed that census reports provide nearly exact estimates of the moments of the marginal distribution of economic variables, and can be used in combination with cross-sectional or panel samples to improve estimation accuracy in economic studies. The information with summary statistics has been used to improve the estimation efficiency although the individual-level data from historical study are unavailable. Recently, [5] used the covariate-specified disease prevalence information to increase the power of case-control studies. [4] proposed the efficient estimate of Cox model with auxiliary subgroup survival information. Similarily, the method of these papers to synthesize information from different sources were motivated by the empirical likelihood method that was first introduced by [1]. Empirical likelihood is a nonparametric method of statistical inference, it allows the data analyst to use likelihood Manuscript received October 09, 2018. Accepte