Bayesian Algorithms for Causal Data Mining
Subramani Mani, Constantin F. Aliferis, and Alexander Statnikov; JMLR W&CP 6:121-136,
We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships
from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables.
The CD-B algorithm first estimates the Markov blanket of a node X
using a Bayesian greedy search method and then applies
Bayesian scoring methods to discriminate the parents and children of X
. Using the set of parents and set of children CD-B
constructs a global Bayesian network and outputs the causal effects of a node X
based on the identification of Y arcs.
Recall that if a node X
has two parent nodes A, B
and a child node C
such that there is no arc between
and A, B
are not parents of C
, then the arc from X
is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node X
. The subsequent steps are similar to those of CD-B.
We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks.
We also present comparative results based on the identification of Y structures and Y arcs from the output of the
PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.