Combining Knowledge from Different Sources in Causal Probabilistic Models
Marek J. Druzdzel, Francisco J. Díez; 4(Jul):295-316, 2003.
Abstract
Building probabilistic and decision-theoretic models requires a
considerable knowledge engineering effort in which the most
daunting task is obtaining the numerical parameters. Authors of
Bayesian networks usually combine various sources of information,
such as textbooks, statistical reports, databases, and expert
judgement. In this paper, we demonstrate the risks of such a
combination, even when this knowledge encompasses such seemingly
population-independent characteristics as sensitivity and
specificity of medical symptoms. We show that the criteria ``do
not combine knowledge from different sources'' or ``use only data
from the setting in which the model will be used'' are neither
necessary nor sufficient to guarantee the correctness of the
model. Instead, we offer graphical criteria for determining when
knowledge from different sources can be safely combined into the
general population model. We also offer a method for building
subpopulation models. The analysis performed in this paper and the
criteria we propose may be useful in such fields as knowledge
engineering, epidemiology, machine learning, and statistical
meta-analysis.
[abs][pdf][ps.gz][ps]