Logistic Model Trees with AUC Split
Criterion for the KDD Cup 2009 Small Challenge
Patrick Doetsch, Christian Buck, Pavlo Golik, Niklas
Hoppe, Michael Kramp, Johannes Laudenberg, Christian
Oberdörfer, Pascal Steingrube, Jens Forster and Arne
Mauser ; JMLR W & CP 7:77-88, 2009.
Abstract
In this work, we describe our approach to the
"Small Challenge" of the KDD cup 2009, a classification task
with incomplete data. Preprocessing, feature extraction and
model selection are documented in detail. We suggest a
criterion based on the number of missing values to select a
suitable imputation method for each feature. Logistic Model
Trees (LMT) are extended with a split criterion optimizing the
Area under the ROC Curve (AUC), which was the requested
evaluation criterion. By stacking boosted decision stumps and
LMT we achieved the best result for the "Small Challenge"
without making use of additional data from other feature sets,
resulting in an AUC score of 0.8081. We also present results of
an AUC optimizing model combination that scored only slightly
worse with an AUC score of 0.8074.