Automatic bandwidth selection for recursive kernel density estimators with length-biased data
- PDF / 407,057 Bytes
- 24 Pages / 439.37 x 666.142 pts Page_size
- 62 Downloads / 165 Views
ORIGINAL PAPER
Automatic bandwidth selection for recursive kernel density estimators with length-biased data Yousri Slaoui1 Received: 10 November 2018 / Accepted: 18 June 2019 Japanese Federation of Statistical Science Associations 2019
Abstract In this paper we propose an automatic selection of the bandwidth of the recursive kernel estimators of a probability density function defined by the stochastic approximation algorithm in the case of length-biased data. We compared our proposed plug-in method with the cross-validation method and the so-called smooth bootstrap bandwidth selector via simulations as well as a real data set. Results showed that, using the selected plug-in bandwidth and some special stepsizes, the proposed recursive estimators will be very competitive to the non-recursive one in terms of estimation error and much better in terms of computational costs. Keywords Density estimation Stochastic approximation algorithm Weighted data Smoothing, curve fitting
Mathematics Subject Classification Primary 62G07 62L20 65D10
1 Introduction Length-biased data arise when the probability that an item is sampled is proportional to its length. This type of data are produced when the probability of selecting an observation depends on its numerical value or on other related covariables. In many applications, it is not possible to observe and record all events which occur. Then, there is a need to adjust the probabilities of actual occurrence of events. The recorded observation may be assumed to have the probability density function gð xÞ, that is of the form & Yousri Slaoui [email protected] 1
Laboratoire de Mathe´matiques et Application, Universite´ de Poitiers, 86962 Futuroscope, Chasseneuil, France
123
Japanese Journal of Statistics and Data Science
gð x Þ ¼
wð xÞf ð xÞ ; g
x[0
with
g¼
Z
wð xÞf ð xÞdx;
R
where wð xÞ is a non-negative known function called the weighting function and f ð xÞ is the original density. Rao (1965) traced the concept of a weighted distribution to the study of the effects of methods of ascertainment upon the estimation of frequencies (Fisher 1934). Another interesting contribution is given by Hanin et al. (1997), who obtained the distribution of tumor size at detection, assuming a simple limiting form, with age at detection tending to infinity, which is found to be a weighted distribution with weight function wð xÞ ¼ xr , where r ¼ minf1; hg and h is a parameter of the progression time distribution. In this paper, we consider the case when wð xÞ ¼ x, the case known as lengthbiased data and where typically the observations are non-negative. Thus we consider a random sample Y1 ; . . .; Yn , nonnegative independent and identically distributed random variables with common density function fY . Z xf ð xÞ ; x [ 0 with g ¼ xf ð xÞdx; f Y ð xÞ ¼ g R Pn 1 1 where g can be estimated by gn ¼ n , called the harmonic mean of i¼1 Yi Y. Cox (2005) suggested and showed the properties of the harmonic mean. These type of data frequently arise in survey sa
Data Loading...