Theorem 3: The procedure of an ordered Gibbs sampler
applied to a dependency network for and , where each
is finite (and hence discrete) and each local distribution in
is positive, defines a Markov chain with a unique
stationary joint distribution for that can be reached from any
initial state of the chain.
Technically speaking, the procedure of Gibbs sampling
applied to an inconsistent dependency network is not itself a Gibbs
sampler, because there is no joint distribution consistent with all
the local distributions. Consequently, we call this procedure an
ordered pseudo-Gibbs sampler.
One rather disturbing property of this procedure is that it produces a
joint distribution that is likely to be inconsistent with many of the
conditional distributions used to produce it. Nonetheless, as we have
suggested and shall examine further, when each of the local
distributions of a dependency network are learned from the same data,
these distributions should be almost consistent with the joint
distribution.
Another disturbing observation is that the joint distribution obtained
will depend on the order in which the pseudo-Gibbs sampler visits the
variables. For example, consider the dependency network with the
structure
, saying that depends on
but does not depend on . If we draw sample-pairs
--that is, and then --then the resulting
stationary distribution will have and dependent. In
contrast, if we draw sample-pairs , then the resulting
stationary distribution will have and independent. Due to
the near consistency of the local distributions, however, the joint
distributions obtained from different orderings will be close. If we
indeed discover the dependency-network structure
in practice, then and must be ``almost'' independent.
Given a graphical model for a domain and an ordered Gibbs sampler that
extracts a joint distribution from that model, we can apply the rules
of probability to this joint distribution to answer probabilistic
queries. Alternatively, we can apply Algorithm 1 directly to a given
dependency network. Such an application may yield different answers
than those computed from the joint distribution obtained from the
dependency network, but can be justified heuristically due to near
consistency. In Sections 3.3 and 3.4, we
examine the issue of near consistency more carefully. In the
remainder of this section, we mention two basic properties of
dependency networks.
First, let us consider the distributions that can be represented by a
general dependency-network structure. Unlike the situation for
consistent dependency networks, a general dependency-network structure
and a Markov network structure with the same adjacencies do not
represent the same distributions. In fact, a dependency network with
a given structure defines a larger set of distributions than a Markov
network with the same adjacencies. As an example, consider the
dependency-network structure
. In a simple experiment, we sampled local distributions for this
structure from a uniform distribution. We then computed the
stationary distribution of the Markov chain defined by a pseudo-Gibbs
sampler with variable order . In all runs, we found
that and were conditionally dependent given in
the stationary distribution. In a Markov network with the same
adjacencies, and must be conditionally independent given
.
Second, let us consider a simple necessary condition for consistency.
We say that a dependency network for is bi-directional if
is a parent of if and only if is a parent of ,
for all and in . In addition, let be the
parent of node . We say that a consistent dependency
network is minimal if and only if, for every node and for
every parent , is not independent of given the
remaining parents of . With these definitions, we have the
following theorem, proved in the Appendix.
Theorem 4: A minimal consistent dependency network for
a positive distribution must be bi-directional.