A Combination of Boosting and Bagging
for KDD Cup 2009 - Fast Scoring on a Large Database
Jianjun Xie, Viktoria Rojkova, Siddharth Pal and
Stephen Coggeshall ; JMLR W & CP 7:35-43, 2009.
Abstract
We present the ideas and methodologies that
we used to address the KDD Cup 2009 challenge on rank-ordering
the probability of churn, appetency and up-selling of wireless
customers. We choose stochastic gradient boosting tree (TreeNet
®) as our main classifier to handle this large unbalanced
dataset. In order to further improve the robustness and
accuracy of our results, we bag a series of boosted tree models
together as our final submission. Through our exploration we
conclude that the most critical factors to achieve our results
are effective variable preprocessing and selection, proper
imbalanced data handling as well as the combination of bagging
and boosting techniques.