Divide and Transfer: an Exploration of Segmented
Transfer to Detect Wikipedia Vandalism
S.-C. Chin W.N. Street; JMLR
W&CP 27:133–144, 2012.
Abstract
The paper applies knowledge transfer methods to the problem of
detecting
Wikipedia vandalism detection, defined as malicious editing intended to
compromise the integrity
of the content of articles. A major challenge of detecting Wikipedia
vandalism is the lack of a
large amount of labeled training data. Knowledge transfer addresses
this challenge by leveraging
previously acquired knowledge from a source task. However, the
characteristics of Wikipedia
vandalism are heterogeneous, ranging from a small replacement of a
letter to a massive deletion of
text. Selecting an informative subset from the source task to avoid
potential negative transfer
becomes a primary concern given this heterogeneous nature. The paper
explores knowledge
transfer methods to generalize learned models from a heterogeneous
dataset to a more
uniform dataset while avoiding negative transfer. The two novel
segmented transfer
(ST) approaches map unlabeled data
from the target task to the most related cluster
from the source task, classifying the unlabeled data using the most
relevant learned
models.