We compared our SVM implementation for regression problems (SVMTorch) to the one from Flake and Lawrence [3] using their publicly available software Nodelib. This is interesting because Nodelib is based on SMO where the variables and are selected simultaneously, which is not the case for SVMTorch. Note also that Nodelib includes some enhancements compared to SMO which are different from those proposed by Shevade et al [13].
Both these algorithms use an internal cache in order to be able to solve large-scale problems. All the experiments presented here have been done on a LINUX Pentium III 750Mhz, with the gcc compiler. The parameters of the algorithms were not chosen to obtain the best generalization performances, since the goal was to compare the speed of the algorithms. However, we have chosen them in order to obtain reasonable results. Both programs used the same parameters with regard to cache, precision, etc. For Nodelib, the other parameters were set using the default values proposed by the authors.6 All the programs were compiled using double precision. We compared the programs on five different tasks :
Note also that Forest and MNIST are respectively and sparse (contain respectively and null values in their input matrices). Since SVMTorch can handle sparse data (as can SVM-Light), we tested this option in the experiments described in TABLES 3 and 4. The parameters used to train the datasets can be found in TABLE 1. Note that all experiments used a Gaussian kernel10 and a value of C = 1000, and the termination criterion was the verified KKT conditions with a precision of 0.01.
For the experiments involving SVMTorch, we have tested a version with shrinking but without verifying at the end of the optimization whether all the suppressed variables verified the KKT conditions (SVMTorch), with no shrinking (SVMTorchN), and a version with shrinking and verification at the end of the optimization, as done in SVM-Light (SVMTorchU). As it will be seen in the results, the first method has a big speedup advantage, but only a small negative impact on the generalization performance in general. However, sometimes the default value of 100 iterations before a variable is removed by shrinking must be changed to obtain the correct solution.