Next: The algorithm and its
Up: The ACCL algorithm
Previous: Comparing mutual information between
Let
be a set of observations
over
binary variables. If
it is efficient to represent each
observation in
as a list of the variables that are on in
the respective observation. Thus, data point
will be represented by the list
. The space required by the lists is no more than
and
is much smaller than the space required by the binary vector
representation of the same data (i.e.,
).
Note, moreover, that the total number
of co-occurrences in
the dataset
is
For the variables that co-occur with
, a set of co-occurrence
lists
is created. Each
contains records
and is sorted by
decreasing
. To represent the lists
(that
contain many elements) we use their ``complements''
.
It can be shown [Meila-Predoviciu 1999] that the computation of the co-occurrence
counts and the construction of the lists
and
,
for all
, takes an amount of time proportional to the number
of co-occurrences
, up to a logarithmic factor:
Comparing this value with
, which is the time to compute
in the CHOWLIU algorithm, we see that the present method
replaces the dimension of the domain
by
. The memory requirements
for the lists are also at most proportional to
, hence
.
Next: The algorithm and its
Up: The ACCL algorithm
Previous: Comparing mutual information between
Journal of Machine Learning Research
2000-10-19