Asymptotic Theory of Statistical Inference
Let \(M=\left\{ p({\varvec{x}}, {\varvec{\xi }})\right\} \) be a statistical model specified by parameter \({\varvec{\xi }}\) , which is to be estimated.
- PDF / 235,366 Bytes
- 13 Pages / 439.37 x 666.142 pts Page_size
- 113 Downloads / 303 Views
Asymptotic Theory of Statistical Inference
7.1 Estimation Let M = { p(x, ξ)} be a statistical model specified by parameter ξ, which is to be estimated. When we observe N independent data D = {x 1 , . . . , x N } generated from p(x, ξ), we want to know the underlying parameter ξ. This is a problem of estimation, and an estimator ξˆ = f (x 1 , . . . , x N )
(7.1)
is a function of D. The estimation error is given by e = ξˆ − ξ,
(7.2)
when ξ is the true value. The bias of the estimator is defined by b(ξ) = E ξˆ − ξ,
(7.3)
where the expectation is taken with respect to p(x, ξ). An estimator is unbiased when b(ξ) = 0. The asymptotic theory studies the behavior of an estimator when N is large. When the bias satisfies (7.4) lim b(ξ) = 0, N →∞
it is asymptotically unbiased. It is expected that a good estimator converges to the true parameter as N tends to infinity. It is written as (7.5) lim ξˆ = ξ. N →∞
© Springer Japan 2016 S. Amari, Information Geometry and Its Applications, Applied Mathematical Sciences 194, DOI 10.1007/978-4-431-55978-8_7
165
166
7 Asymptotic Theory of Statistical Inference
When this holds, an estimator is consistent. The accuracy of an estimator is measured by the error covariance matrix, V = Vi j , Vi j = E ξˆi − ξi ξˆ j − ξ j . (7.6) It decreases in general in proportion to 1/N , so that the estimator ξˆ becomes sufficiently accurate as N increases. The well-known Cramér–Rao Theorem gives a bound of accuracy. ˆ the following inequality Theorem 7.1 For an asymptotically unbiased estimator ξ, holds: 1 (7.7) V ≥ G−1 , N 1 E ξˆi − ξi ξˆ j − ξ j ≥ g i j , (7.8) N where G = gi j is the Fisher information matrix, G−1 = g i j is its inverse, and the matrix inequality implies that V − G−1 /N is positive semi-definite. The maximum likelihood estimator (MLE) is the maximizer of the likelihood, ξˆ MLE = arg max ξ
N
p (x i , ξ) .
(7.9)
i=1
It is known that the MLE is asymptotically unbiased and its error covariance satisfies VMLE
1 = G−1 + O N
1 N2
,
(7.10)
attaining the Cramér–Rao bound (7.7) asymptotically. Such an estimator is said to be Fisher efficient (first-order efficient). Remark We do not mention Bayes estimators, where a prior distribution of parameters is used. However, when the prior distribution is uniform, the MLE is the maximum a posteriori Bayes estimator. Moreover, it has the same asymptotic properties for any regular Bayes prior. Information geometry of Bayes statistics will be touched upon in a later chapter.
7.2 Estimation in Exponential Family An exponential family is a model having excellent properties such as dual flatness. We begin with an exponential family
7.2 Estimation in Exponential Family
167
p(x, θ) = exp {θ · x − ψ(θ)}
(7.11)
to study the statistical theory of estimation, because it is simple and transparent. Given data D, their joint probability distribution is written as p(D, θ) = exp [N {(θ · x¯ ) − ψ(θ)}] ,
(7.12)
where x¯ is the arithmetic mean of the observed examples, x¯ =
N 1 xi . N i=1
(7.13)
Data Loading...