Model selection via information criteria

  • PDF / 239,584 Bytes
  • 19 Pages / 595 x 842 pts (A4) Page_size
  • 114 Downloads / 239 Views

DOWNLOAD

REPORT


MODEL SELECTION VIA INFORMATION CRITERIA Zsolt Talata (Budapest) Dedicated to Endre Cs´ aki and P´ al R´ev´esz on the occasion of their 70th birthdays

Abstract This is a survey of the information criterion approach to model selection problems. New results about context tree estimation and the estimation of the basic neighborhood of Markov random fields are also mentioned.

1. The model selection problem Let a stochastic process { Xt , t ∈ T } be given, where each Xt is a random variable with values a ∈ A, and T is an index set. The joint distribution of the random variables Xt , t ∈ T will be referred to as the distribution of the process and will be denoted by Q. A model of the process determines a hypothetical distribution of the process or a collection of hypothetical distributions. Typically, a model is determined by a structure parameter k with values in some set K, and by a parameter vector θk ∈ Θk ⊂ Rdk ; this model is denoted by Mθk . Given the feasible models of the process, they can classes according to the   be arranged into model structure parameter: Mk = Mθk , θk ∈ Θk ⊂ Rdk . Statistical inference about the process is drawn based on a realization { xt , t ∈ T } of the process observed in the range Rn ⊂ T , where Rn extends with n. Thus the n’th sample is x(n) = { xt , t ∈ Rn }. Some typical examples for processes and their models are listed below. In the case of density function estimation, T = N and the random variables Xt , t ∈ N are independent and identically distributed (i.i.d.) with density function fθk . The n’th sample is { xi , i = 1, . . . , n }. Mathematics subject classification number: 62M99, 60J10, 62M40. Key words and phrases: model selection, information criterion, minimum description length, Bayesian information criterion, context tree estimation, Markov random field, basic neighborhood estimation. Supported by the Hungarian National Foundation for Scientific Research, Grant T046376. 0031-5303/2005/$20.00 c Akad´  emiai Kiad´ o, Budapest

Akad´ emiai Kiad´ o, Budapest Springer, Dordrecht

100

zs. talata

The polynomial fitting involves T ⊆ R, where T is a countable set, A = R, and the model Xt = θk [0] + θk [1] t + θk [2] t2 + · · · + θk [k − 1] tk−1 + Zt , where Zt , t ∈ T are independent random variables with normal distribution, zero mean, unknown common variance, and θk [i] is the i’th component of the k-dimensional parameter vector θk . Here the structure parameter k ∈ N is the degree of the polynomial θk [0] + θk [1] t + θk [2] t2 + · · · + θk [k − 1] tk−1 plus 1, and the n’th sample is { xt , t ∈ {t1 , . . . , tn } ⊂ T }. The process with T = N, A = R is an autoregressive (AR) process of order k if Xt =

k 

ai Xt−i + Zt ,

i=1

where Zt , t ∈ N are independent random variables with normal distribution, zero mean, unknown common variance, and ai ∈ R, i = 1, . . . , k form the parameter vector θk . Here the structure parameter k ∈ N is the number of coefficients ai , and the n’th sample is { xi , i = 1, . . . , n }. The autoregressive moving average (ARMA) process is similar t