Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data
J. Xie & T.
Xiong; JMLR W&CP 16:85–98, 2011.
Abstract
In this paper, we describe the stochastic semi-supervised learning approach that we
used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is
designed to tackle the binary classification problem under the condition that the number of
labeled data points is extremely small and the two classes are highly imbalanced. It starts with
only one positive seed given by the contest organizer. We randomly pick additional unlabeled data
points and treat them as “negative” seeds based on the fact that the positive label is rare across
all datasets. A classifier is trained using the “labeled” data points and then is used to predict the
unlabeled dataset. We take the final result to be the average of
n stochastic iterations.
Supervised learning was used as a large number of labels were purchased. Our approach
is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the
contest.
Page last modified on Wed Mar 30 11:09:49 2011.