An Ensemble of Three Classifiers for KDD
Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and
Selective Naive Bayes
Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen,
Tsung-Hsien Chiang, Chun- Sung Ferng, Cho-Jui Hsieh, Yi-Kuang
Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang,
Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin and Shou-de Lin
; JMLR W & CP 7:57-64, 2009.
Abstract
This paper describes our ensemble of three
classifiers for the KDD Cup 2009 challenge. First, we transform
the three binary classification tasks into a joint multi-class
classification problem, and solve an l1-regularized maximum
entropy model under the LIBLINEAR framework. Second, we propose
a heterogeneous base learner, which is capable of handling
different types of features and missing values, and use
AdaBoost to improve the base learner. Finally, we adopt a
selective naïve Bayes classifier that automatically groups
categorical features and discretizes numerical ones. The
parameters are tuned using crossvalidation results rather than
the 10% test results on the competition website. Based on the
observation that the three positive labels are exclusive, we
conduct a post-processing step using the linear SVM to jointly
adjust the prediction scores of each classifier on the three
tasks. Then, we average these prediction scores with careful
validation to get the final outputs. Our final average AUC on
the whole test set is 0.8461, which ranks third place in the
slow track of KDD Cup 2009.