Bayesian Algorithms for Causal Data Mining
Subramani Mani, Constantin F. Aliferis, and Alexander Statnikov; JMLR W&CP 6:121-136,
2010.
Abstract
We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships
from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables.
The CD-B algorithm first estimates the Markov blanket of a node
X using a Bayesian greedy search method and then applies
Bayesian scoring methods to discriminate the parents and children of
X. Using the set of parents and set of children CD-B
constructs a global Bayesian network and outputs the causal effects of a node
X based on the identification of Y arcs.
Recall that if a node
X has two parent nodes
A, B and a child node
C such that there is no arc between
A, B and
A, B are not parents of
C, then the arc from
X to
C is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node
X. The subsequent steps are similar to those of CD-B.
We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks.
We also present comparative results based on the identification of Y structures and Y arcs from the output of the
PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.