Asymptotic Theory of Statistical Inference
Let \(M=\left\{ p({\varvec{x}}, {\varvec{\xi }})\right\} \) be a statistical model specified by parameter \({\varvec{\xi }}\) , which is to be estimated.
- PDF / 235,366 Bytes
- 13 Pages / 439.37 x 666.142 pts Page_size
- 113 Downloads / 327 Views
		    Asymptotic Theory of Statistical Inference
 
 7.1 Estimation Let M = { p(x, ξ)} be a statistical model specified by parameter ξ, which is to be estimated. When we observe N independent data D = {x 1 , . . . , x N } generated from p(x, ξ), we want to know the underlying parameter ξ. This is a problem of estimation, and an estimator ξˆ = f (x 1 , . . . , x N )
 
 (7.1)
 
 is a function of D. The estimation error is given by e = ξˆ − ξ,
 
 (7.2)
 
 when ξ is the true value. The bias of the estimator is defined by   b(ξ) = E ξˆ − ξ,
 
 (7.3)
 
 where the expectation is taken with respect to p(x, ξ). An estimator is unbiased when b(ξ) = 0. The asymptotic theory studies the behavior of an estimator when N is large. When the bias satisfies (7.4) lim b(ξ) = 0, N →∞
 
 it is asymptotically unbiased. It is expected that a good estimator converges to the true parameter as N tends to infinity. It is written as (7.5) lim ξˆ = ξ. N →∞
 
 © Springer Japan 2016 S. Amari, Information Geometry and Its Applications, Applied Mathematical Sciences 194, DOI 10.1007/978-4-431-55978-8_7
 
 165
 
 166
 
 7 Asymptotic Theory of Statistical Inference
 
 When this holds, an estimator is consistent.   The accuracy of an estimator is measured by the error covariance matrix, V = Vi j ,    Vi j = E ξˆi − ξi ξˆ j − ξ j . (7.6) It decreases in general in proportion to 1/N , so that the estimator ξˆ becomes sufficiently accurate as N increases. The well-known Cramér–Rao Theorem gives a bound of accuracy. ˆ the following inequality Theorem 7.1 For an asymptotically unbiased estimator ξ, holds: 1 (7.7) V ≥ G−1 , N    1 E ξˆi − ξi ξˆ j − ξ j ≥ g i j , (7.8) N     where G = gi j is the Fisher information matrix, G−1 = g i j is its inverse, and the matrix inequality implies that V − G−1 /N is positive semi-definite. The maximum likelihood estimator (MLE) is the maximizer of the likelihood, ξˆ MLE = arg max ξ
 
 N 
 
 p (x i , ξ) .
 
 (7.9)
 
 i=1
 
 It is known that the MLE is asymptotically unbiased and its error covariance satisfies VMLE
 
 1 = G−1 + O N
 
 
 
 1 N2 
 
 ,
 
 (7.10)
 
 attaining the Cramér–Rao bound (7.7) asymptotically. Such an estimator is said to be Fisher efficient (first-order efficient). Remark We do not mention Bayes estimators, where a prior distribution of parameters is used. However, when the prior distribution is uniform, the MLE is the maximum a posteriori Bayes estimator. Moreover, it has the same asymptotic properties for any regular Bayes prior. Information geometry of Bayes statistics will be touched upon in a later chapter.
 
 7.2 Estimation in Exponential Family An exponential family is a model having excellent properties such as dual flatness. We begin with an exponential family
 
 7.2 Estimation in Exponential Family
 
 167
 
 p(x, θ) = exp {θ · x − ψ(θ)}
 
 (7.11)
 
 to study the statistical theory of estimation, because it is simple and transparent. Given data D, their joint probability distribution is written as p(D, θ) = exp [N {(θ · x¯ ) − ψ(θ)}] ,
 
 (7.12)
 
 where x¯ is the arithmetic mean of the observed examples, x¯ =
 
 N 1  xi . N i=1
 
 (7.13)		
Data Loading...
 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	