Theorem 1: The set of positive distributions that can
be encoded by a consistent dependency network with graph is
equal to the set of positive distributions that can be encoded by a
Markov network whose structure has the same adjacencies as .
The two graphical representations are different in that
Markov networks quantify dependencies with potential functions,
whereas dependency networks use conditional probabilities. We have
found the latter to be significantly easier to interpret.
The proof of Theorem 1 appears in the Appendix, but it is essentially
a restatement of the Hammersley-Clifford theorem (e.g., Besag, 1974).
This correspondence is no coincidence. As is discussed in Besag
(1974), several researchers who developed the Markov-network
representation did so by initially investigating a graphical
representation that fits our definition of consistent dependency
network. In particular, several researchers including Lévy
(1948), Bartlett (1955, Section
2.2), and Brook (1964) considered
lattice systems where each variable depended only on its nearest
neighbors , and quantified the dependencies within these systems
using the conditional probability distributions
. They
then showed, to various levels of generality, that the only joint
distributions mutually consistent with each of the conditional
distributions also satisfy Equation 2. Hammersley
and Clifford, in a never published manuscript, and Besag
(1974) considered the more general case where each
variable could have an arbitrary set of parents. They showed that,
provided the joint distribution for is positive, any graphical
model specifying the independencies in Equation 1 must also
satisfy Equation 2. One interesting point is that
these researchers argued for the use of conditional distributions to
quantify the dependencies. They considered the resulting potential
form in Equation 2 to be a mathematical necessity
rather than a natural expression of dependency. As we have just
discussed, we share this view.
The equivalence of consistent dependency networks and Markov networks
suggests a straightforward approach for learning a consistent
dependency network from exchangeable (i.i.d.) data. Namely, one
learns the structure and potentials of a Markov network (e.g.,
Whittaker, 1990), and then computes (via
probabilistic inference) the conditional distributions required by the
dependency network. Alternatively, one can learn a related model such
as a Bayesian network, decomposable model, or hierarchical log-linear
model (see, e.g., Lauritzen, 1996) and convert it
to a consistent dependency network. Unfortunately, the conversion
process can be computationally expensive in many situations. In the
next section, we extend the definition of dependency network to
include inconsistent dependency networks and provide algorithms for
learning such networks that are more computationally efficient than
those just described. In the remainder of this section, we apply well
known results about probabilistic inference to consistent dependency
networks. This discussion will be useful for our further development
of (general) dependency networks.