J. Xie & T.
Xiong; JMLR W&CP 16:85–98, 2011.
Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data
In this paper, we describe the stochastic semi-supervised learning approach that we
used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is
designed to tackle the binary classiﬁcation problem under the condition that the number of
labeled data points is extremely small and the two classes are highly imbalanced. It starts with
only one positive seed given by the contest organizer. We randomly pick additional unlabeled data
points and treat them as “negative” seeds based on the fact that the positive label is rare across
all datasets. A classiﬁer is trained using the “labeled” data points and then is used to predict the
unlabeled dataset. We take the ﬁnal result to be the average of n
Supervised learning was used as a large number of labels were purchased. Our approach
is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the
Page last modified on Wed Mar 30 11:09:49 2011.