K. Tomanek & K. Morik; JMLR
W&CP 16:169–181, 2011.
Inspecting Sample Reusability for Active Learning
Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm.
Our empirical investigation on general classiﬁcation problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the ﬁnal model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One ﬁnding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.