Subgroup Discovery with CN2-SD
Nada Lavrač, Branko Kavšek, Peter Flach, Ljupčo Todorovski; 5(Feb):153--188, 2004.
Abstract
This paper investigates how to adapt standard classification rule
learning approaches to subgroup discovery. The goal of subgroup
discovery is to find rules describing subsets of the population
that are sufficiently large and statistically unusual. The paper
presents a subgroup discovery algorithm,
CN2-SD, developed
by modifying parts of the CN2 classification rule learner: its
covering algorithm, search heuristic, probabilistic classification
of instances, and evaluation measures. Experimental evaluation of
CN2-SD on 23 UCI data sets shows substantial reduction of
the number of induced rules, increased rule coverage and rule
significance, as well as slight improvements in terms of the area
under ROC curve, when compared with the CN2 algorithm. Application
of
CN2-SD to a large traffic accident data set confirms
these findings.
[abs][pdf][ps.gz][ps]