Learning Probabilistic Models of Link Structure
Lisa Getoor, Nir Friedman, Daphne Koller, Benjamin Taskar;
3(Dec):679-707, 2002.
Abstract
Most real-world data is heterogeneous and richly interconnected.
Examples include the Web, hypertext, bibliometric data and
social networks.
In contrast, most
statistical learning methods work with "flat" data representations,
forcing us to convert our data into a form that loses much of the
link structure.
The recently introduced framework of
probabilistic relational models (PRMs) embraces the object-relational
nature of structured data by capturing probabilistic interactions
between attributes of related entities. In this paper, we
extend this framework by modeling interactions between the attributes
and the link structure itself.
An advantage of our approach is a unified generative model for
both content and relational structure.
We propose two mechanisms for
representing a probabilistic distribution over link structures:
reference uncertainty and
existence uncertainty. We describe the appropriate conditions for
using each model and present learning algorithms for each. We present
experimental results showing that the learned models can be used to
predict link structure and, moreover, the observed link
structure can be used to provide better predictions for the attributes
in the model.
[abs]
[pdf]
[ps.gz]
[ps]