J. Xie, S. Leishman, L. Tian,
D. Lisuk, S. Koo & M. Blume; JMLR W&CP 18:183–197, 2012.
Feature Engineering in User’s Music Preference Prediction
The second track of this year’s KDD Cup asked contestants to separate a user’s
highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task
as a binary classiﬁcation problem and addressed it utilizing gradient boosted decision trees. We
created a set of highly predictive features, each with a clear explanation. These features were
grouped into ﬁve categories: hierarchical linkage features, track-based statistical features,
user-based statistical features, features derived from the k
-nearest neighbors of the users, and
features derived from the k
-nearest neighbors of the items. No music domain knowledge was
needed to create these features. We demonstrate that each group of features improved the
prediction accuracy of the classiﬁcation model. We also discuss the top predictive features of each
category in this paper.
Page last modified on Tue May 29 10:23:29 2012.