We compared our SVM implementation for regression problems (SVMTorch)
to the one from Flake and Lawrence [3] using
their publicly available software Nodelib. This is interesting
because Nodelib is based on SMO where the
variables
and
are selected simultaneously,
which is not the case for SVMTorch. Note also that
Nodelib includes some enhancements compared to SMO which are
different from those proposed by Shevade et al [13].
Both these algorithms use an internal cache in order to be able to solve large-scale problems. All the experiments presented here have been done on a LINUX Pentium III 750Mhz, with the gcc compiler. The parameters of the algorithms were not chosen to obtain the best generalization performances, since the goal was to compare the speed of the algorithms. However, we have chosen them in order to obtain reasonable results. Both programs used the same parameters with regard to cache, precision, etc. For Nodelib, the other parameters were set using the default values proposed by the authors.6 All the programs were compiled using double precision. We compared the programs on five different tasks :
Note also that Forest and MNIST are respectively
and
sparse (contain respectively
and
null values
in their input matrices).
Since SVMTorch can handle sparse data (as can SVM-Light), we
tested this option in the experiments described in
TABLES 3 and 4.
The parameters used to train the datasets can be found in
TABLE 1.
Note that all experiments used a Gaussian kernel10
and a value of C = 1000, and the termination criterion was the verified
KKT conditions with a precision of 0.01.
For the experiments involving SVMTorch, we have tested a version with shrinking but without verifying at the end of the optimization whether all the suppressed variables verified the KKT conditions (SVMTorch), with no shrinking (SVMTorchN), and a version with shrinking and verification at the end of the optimization, as done in SVM-Light (SVMTorchU). As it will be seen in the results, the first method has a big speedup advantage, but only a small negative impact on the generalization performance in general. However, sometimes the default value of 100 iterations before a variable is removed by shrinking must be changed to obtain the correct solution.