Let us now compare SVMTorch and Nodelib on small datasets. In the results given in TABLE 3, only the first 5000 training examples were used. The size of the cache was set to 300Mb, so that the whole kernel matrix could be kept in memory. For each problem, we also computed the output median (which minimizes the mean absolute error (MAE) without any information about the inputs) on the training set and computed the performance of this median over the training and the test sets in order to have a better idea of the performance of the various algorithms.
|
As can be seen, for all the datasets, SVMTorch is usually
many times faster than Nodelib (except for Artificial
in the case of SVMTorch without shrinking).
Since the whole matrix of the quadratic problem was in memory, handling of
the cache had no effect on the speed results. Thus one can conclude that
one of the main differences between these algorithms is the selection of the
subproblem, and that selecting the
independently
of the
is very efficient.
However, Nodelib gave slightly better results in terms of the objective
function (probably due to the fact that their termination criterion is
stronger than ours; see [1] for a comparison between termination
criteria), but not in terms of test error.
Note that for problems of this size, shrinking
does not cause the performance to deteriorate too much
(check the values of the objective function as well as the training and
test set performances) but does speed the algorithm up a lot.
Note also that using the sparse format was not good in the case of
Forest but was good for MNIST, which has a higher input
dimension: this means that the cost induced by the use of sparse format
is balanced by the gain obtained in the kernel computation only when
the input size is large enough.