Estimation in the Presence of Hidden Variables
Let us consider a statistical model \(M= \left\{ p({\varvec{x}}, {\varvec{\xi }}) \right\} \) , where vector random variable \({\varvec{x}}\) is divided into two parts \({\varvec{x}}= \left( {\varvec{y}}, {\varvec{h}}\right) \) so that \(p({\varvec{x}}, {
- PDF / 273,325 Bytes
- 11 Pages / 439.37 x 666.142 pts Page_size
- 106 Downloads / 219 Views
Estimation in the Presence of Hidden Variables
8.1 EM Algorithm 8.1.1 Statistical Model with Hidden Variables Let us consider a statistical model M = { p(x, ξ)}, where vector random variable x is divided into two parts x = ( y, h) so that p(x, ξ) = p( y, h; ξ). When x is not fully observed but y is observed, h is called a hidden variable. In such a case, we estimate ξ from observed y. These situations occur in many applications. One can eliminate the hidden variable h by marginalization such that pY ( y, ξ) =
p( y, h; ξ)d h.
(8.1)
Then, we have a statistical model M = { pY ( y, ξ)} which does not include hidden variables. However, in many cases, the form of p(x, ξ) is simple and estimation is tractable in M, but M is complicated because of integration or summation over h. Estimation in such a model is computationally intractable. Typically, M is an exponential family. The EM algorithm is a procedure to estimate ξ by using a large model M from which model M is derived. Let us consider a larger model S = {q( y, h)}
(8.2)
consisting of all probability density functions of ( y, h). When both y and h are binary variables, S is a probability simplex so that it is an exponential family. We study the continuous variable case similarly, without considering delicate mathematical problems. Model M is included in S as a submanifold. Observed data give an observed point 1 δ (x − x i ) (8.3) q(x) ¯ = N © Springer Japan 2016 S. Amari, Information Geometry and Its Applications, Applied Mathematical Sciences 194, DOI 10.1007/978-4-431-55978-8_8
179
180
8 Estimation in the Presence of Hidden Variables
in S when examples x 1 , . . . , x N are fully observed. This is the empirical distribution. When S is an exponential family, it is given by the sufficient statistic η¯ = x¯ =
1 xi N
(8.4)
in the η-coordinates. The MLE is given by m-projecting q(x) ¯ to M. We do not have a full observed point q(x) ¯ in the hidden variable case. We observe only y so that we have an empirical distribution q¯Y ( y) of y only. In order to have a candidate of a joint distribution q( ¯ y, h), we use an arbitrary conditional distribution q(h| y) and put (8.5) q( ¯ y, h) = q¯Y ( y)q(h| y). Since q(h| y) is arbitrary, we take all of them as possible candidates of observed points and consider a submanifold D = {q( ¯ y, h) |q( ¯ y, h) = q¯Y ( y)q(h| y), q(h| y) is arbitrary} .
(8.6)
This is the observed submanifold in S specified by the partially observed data y1 , . . . , y N . By using the empirical distribution, it is written as q ( y, h) =
1 δ y − yi q h yi N
(8.7)
The data submanifold D is m-flat, because it is linear with respect to q(h| y). Before analyzing the estimation procedure, we give two simple examples of the hidden variable model. (1) Gaussian mixture model Let N (μ) be a Gaussian distribution of y with mean μ and variance 1. We can treat more general multivariate Gaussian models with unknown covariance matrices in a similar way, but this simple model is enough for the purpose of illustration. The Gauss
Data Loading...