Let us now compare SVMTorch and Nodelib on small datasets. In the results given in TABLE 3, only the first 5000 training examples were used. The size of the cache was set to 300Mb, so that the whole kernel matrix could be kept in memory. For each problem, we also computed the output median (which minimizes the mean absolute error (MAE) without any information about the inputs) on the training set and computed the performance of this median over the training and the test sets in order to have a better idea of the performance of the various algorithms.
|
As can be seen, for all the datasets, SVMTorch is usually many times faster than Nodelib (except for Artificial in the case of SVMTorch without shrinking). Since the whole matrix of the quadratic problem was in memory, handling of the cache had no effect on the speed results. Thus one can conclude that one of the main differences between these algorithms is the selection of the subproblem, and that selecting the independently of the is very efficient. However, Nodelib gave slightly better results in terms of the objective function (probably due to the fact that their termination criterion is stronger than ours; see [1] for a comparison between termination criteria), but not in terms of test error. Note that for problems of this size, shrinking does not cause the performance to deteriorate too much (check the values of the objective function as well as the training and test set performances) but does speed the algorithm up a lot. Note also that using the sparse format was not good in the case of Forest but was good for MNIST, which has a higher input dimension: this means that the cost induced by the use of sparse format is balanced by the gain obtained in the kernel computation only when the input size is large enough.