Accurate Ensembles for Data Streams: Combining Restricted Hoeffding
Trees using Stacking
Albert Bifet (University of Waikato), Eibe Frank (University of Waikato),
Geoffrey Holmes (University of Waikato), and Bernhard Pfahringer (University
of Waikato);
JMLR W&P 13:225-240, 2010.
Abstract
The success of simple methods for classification shows that is is often
not necessary to model complex attribute interactions to obtain
good classification accuracy on practical problems. In this paper, we
propose to exploit this phenomenon in the data stream context by
building an ensemble of Hoeffding trees that are each limited to a
small subset of attributes. In this way, each tree is restricted to model
interactions between attributes in its corresponding subset. Because
it is not known a priori which attribute subsets are relevant for prediction,
we build exhaustive ensembles that consider all possible attribute
subsets of a given size. As the resulting Hoeffding trees are not all
equally important, we weigh them in a suitable manner to obtain accurate
classifications. This is done by combining the log-odds of their
probability estimates using sigmoid perceptrons, with one perceptron
per class. We propose a mechanism for setting the perceptrons' learning
rate using the ADWIN change detection method for data streams,
and also use ADWIN to reset ensemble members (i.e. Hoeffding trees)
when they no longer perform well. Our experiments show that the resulting
ensemble classifier outperforms bagging for data streams in
terms of accuracy when both are used in conjunction with adaptive
naive Bayes Hoeffding trees, at the expense of runtime and memory
consumption.