Asymptotic Theory of Statistical Inference

Let \(M=\left\{ p({\varvec{x}}, {\varvec{\xi }})\right\} \) be a statistical model specified by parameter \({\varvec{\xi }}\) , which is to be estimated.

  • PDF / 235,366 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 113 Downloads / 302 Views

DOWNLOAD

REPORT


Asymptotic Theory of Statistical Inference

7.1 Estimation Let M = { p(x, ξ)} be a statistical model specified by parameter ξ, which is to be estimated. When we observe N independent data D = {x 1 , . . . , x N } generated from p(x, ξ), we want to know the underlying parameter ξ. This is a problem of estimation, and an estimator ξˆ = f (x 1 , . . . , x N )

(7.1)

is a function of D. The estimation error is given by e = ξˆ − ξ,

(7.2)

when ξ is the true value. The bias of the estimator is defined by   b(ξ) = E ξˆ − ξ,

(7.3)

where the expectation is taken with respect to p(x, ξ). An estimator is unbiased when b(ξ) = 0. The asymptotic theory studies the behavior of an estimator when N is large. When the bias satisfies (7.4) lim b(ξ) = 0, N →∞

it is asymptotically unbiased. It is expected that a good estimator converges to the true parameter as N tends to infinity. It is written as (7.5) lim ξˆ = ξ. N →∞

© Springer Japan 2016 S. Amari, Information Geometry and Its Applications, Applied Mathematical Sciences 194, DOI 10.1007/978-4-431-55978-8_7

165

166

7 Asymptotic Theory of Statistical Inference

When this holds, an estimator is consistent.   The accuracy of an estimator is measured by the error covariance matrix, V = Vi j ,    Vi j = E ξˆi − ξi ξˆ j − ξ j . (7.6) It decreases in general in proportion to 1/N , so that the estimator ξˆ becomes sufficiently accurate as N increases. The well-known Cramér–Rao Theorem gives a bound of accuracy. ˆ the following inequality Theorem 7.1 For an asymptotically unbiased estimator ξ, holds: 1 (7.7) V ≥ G−1 , N    1 E ξˆi − ξi ξˆ j − ξ j ≥ g i j , (7.8) N     where G = gi j is the Fisher information matrix, G−1 = g i j is its inverse, and the matrix inequality implies that V − G−1 /N is positive semi-definite. The maximum likelihood estimator (MLE) is the maximizer of the likelihood, ξˆ MLE = arg max ξ

N 

p (x i , ξ) .

(7.9)

i=1

It is known that the MLE is asymptotically unbiased and its error covariance satisfies VMLE

1 = G−1 + O N



1 N2

,

(7.10)

attaining the Cramér–Rao bound (7.7) asymptotically. Such an estimator is said to be Fisher efficient (first-order efficient). Remark We do not mention Bayes estimators, where a prior distribution of parameters is used. However, when the prior distribution is uniform, the MLE is the maximum a posteriori Bayes estimator. Moreover, it has the same asymptotic properties for any regular Bayes prior. Information geometry of Bayes statistics will be touched upon in a later chapter.

7.2 Estimation in Exponential Family An exponential family is a model having excellent properties such as dual flatness. We begin with an exponential family

7.2 Estimation in Exponential Family

167

p(x, θ) = exp {θ · x − ψ(θ)}

(7.11)

to study the statistical theory of estimation, because it is simple and transparent. Given data D, their joint probability distribution is written as p(D, θ) = exp [N {(θ · x¯ ) − ψ(θ)}] ,

(7.12)

where x¯ is the arithmetic mean of the observed examples, x¯ =

N 1 xi . N i=1

(7.13)