Next: Decomposable priors for tree
Up: Decomposable priors and MAP
Previous: Decomposable priors and MAP
For a model and dataset the logarithm of the
posterior
equals:
plus an additive constant.
The EM algorithm can be adapted to maximize the log posterior for every
fixed [Neal, Hinton 1999]. Indeed, by comparing with equation (5)
we see that the quantity to be maximized is now:
|
(9) |
The prior term does not influence the E step of the EM algorithm,
which proceeds exactly as before (cf. equation (6)). To
be able to successfully maximize the right-hand side of (9)
in the M step we require that log decomposes into a sum of
independent terms matching the decomposition of in
(7). A prior over mixtures of trees that is amenable to
this decomposition is called a decomposable prior. It will have
the following product form
The first parenthesized factor in this equation represents the prior
of the tree structure while the second factor is the prior for
parameters.
Requiring that the prior be decomposable is equivalent to making
several independence assumptions: in particular, it means that the
parameters of each tree in the mixture are independent of the
parameters of all the other trees as well as of the probability of the
mixture variable. In the following section, we show that these assumptions
are not overly restrictive, by constructing decomposable priors
for tree structures and parameters and showing that this class is rich
enough to contain members that are of practical importance.
Next: Decomposable priors for tree
Up: Decomposable priors and MAP
Previous: Decomposable priors and MAP
Journal of Machine Learning Research
2000-10-19