Exploiting the High Predictive Power of Multi-class Subgroups
Tarek Abudawood (University of Bristol) and Peter Flach (University of Bristol);
JMLR W&P 13:177-192, 2010.
Abstract
Subgroup discovery aims at finding subsets of a population whose
class distribution is significantly different from the overall distribution.
A number of multi-class subgroup discovery methods has been
previously investigated, proposed and implemented in the CN2-MSD
system. When a decision tree learner was applied using the induced
subgroups as features, it led to the construction of accurate and compact
predictive models, demonstrating the usefulness of the subgroups.
In this paper we show that, given a significant, sufficient and diverse
set of subgroups, no further learning phase is required to build a good
predictive model. Our systematic study bridges the gap between rule
learning and decision tree modelling by proposing a method which
uses the training information associated with the subgroups to form a
simple tree-based probability estimator and ranker, RankFree-MSD,
without the need for an additional learning phase. Furthermore, we
propose an efficient subgroup pruning algorithm, RankFree-Pruning,
that prunes unimportant subgroups from the subgroup tree in order
to reduce the number of subgroups and the size of the tree without
decreasing predictive performance. Despite the simplicity of our
approach we experimentally show that its predictive performance in
general is comparable to other decision tree and rule learners over 10
multi-class UCI data sets.