One-Class SVMs for Document Classification
Larry M. Manevitz, Malik Yousef;
2(Dec):139-154, 2001.
Abstract
We implemented versions of the SVM appropriate for
one-class classification
in the context of information retrieval. The experiments were conducted on
the standard
Reuters data set.
For the SVM implementation we used both a version of Schoelkopf et al.
and a somewhat different version of one-class
SVM based on identifying "outlier" data as representative of the second-class.
We report on experiments with different kernels for both of these
implementations and with different representations of the data, including
binary vectors, tf-idf representation and a modification called "Hadamard"
representation.
Then we compared it with one-class versions of the algorithms
prototype (Rocchio), nearest neighbor, naive Bayes,
and finally a natural one-class neural network classification
method based on "bottleneck" compression generated filters.
The SVM approach as represented by Schoelkopf was superior to all
the methods except the neural network one, where it was, although
occasionally worse, essentially comparable. However, the SVM methods
turned out to be quite sensitive to the choice of representation and
kernel in ways which are not well understood; therefore, for the time being
leaving the neural network approach as the most robust.
[abs]
[pdf]
[ps.gz]
[ps]