Next: Working Set Size Up: SVMTorch: Support Vector Machines Previous: Convergence

Experimental Results

We compared our SVM implementation for regression problems (SVMTorch) to the one from Flake and Lawrence [3] using their publicly available software Nodelib. This is interesting because Nodelib is based on SMO where the variables $\alpha_i$ and $\alpha_i^{\star}$ are selected simultaneously, which is not the case for SVMTorch. Note also that Nodelib includes some enhancements compared to SMO which are different from those proposed by Shevade et al [13].

Both these algorithms use an internal cache in order to be able to solve large-scale problems. All the experiments presented here have been done on a LINUX Pentium III 750Mhz, with the gcc compiler. The parameters of the algorithms were not chosen to obtain the best generalization performances, since the goal was to compare the speed of the algorithms. However, we have chosen them in order to obtain reasonable results. Both programs used the same parameters with regard to cache, precision, etc. For Nodelib, the other parameters were set using the default values proposed by the authors.⁶ All the programs were compiled using double precision. We compared the programs on five different tasks :

Kin: This dataset⁷ represents a realistic simulation of the forward dynamics of an 8 link all-revolute robot arm. The task is to predict the distance of the end-effector from a target, given features like joint positions, twist angles, etc.
Sunspots: Using a series representing the number of sunspots per day, we created one input/output pair for each day: the yearly average of the year starting the next day had to be predicted using the 12 previous yearly averages.
Artificial: This is an easy artificial dataset based on Sunspots: we create a daily series where each value is the yearly average centered on that day. The task is to predict a value given the 100 previous ones.
Forest: This dataset⁸ is a classification task with 7 classes, where only the first 35000 examples were used here. We transformed it into a regression task where the goal was to predict +1 for examples of class 2 and -1for the other examples. Note that since class 2 was over-represented in the dataset, this transformation leads to a more balanced problem.
MNIST: This dataset⁹ is a classification task with 10 classes (representing handwritten digits). Again, we transformed it in a regression problem where the goal was to predict +1 for examples of classes 0 to 4 and -1 for the other examples.

Note also that Forest and MNIST are respectively $78\%$ and $81\%$ sparse (contain respectively $78\%$ and $81\%$ null values in their input matrices). Since SVMTorch can handle sparse data (as can SVM-Light), we tested this option in the experiments described in TABLES 3 and 4. The parameters used to train the datasets can be found in TABLE 1. Note that all experiments used a Gaussian kernel¹⁰ and a value of C = 1000, and the termination criterion was the verified KKT conditions with a precision of 0.01.

Table: Parameters used for each dataset: # Train = maximum number of training examples, taken at the begining of the dataset, # Test = number of test examples, taken just following the training set examples, Dim = input dimension, $\sigma$ = parameter of the Gaussian kernel, $\epsilon$ = value used in the $\epsilon$ -insensitive loss function.

	# Train	# Test	Dim	$\sigma$	$\epsilon$
Kin	6192	2000	32	100	0.5
Artificial	20000	2000	100	100	0.5
Forest	25000	10000	54	400	0.7
Sunspots	40000	2500	12	900	20
MNIST	60000	10000	784	1650	0.5

For the experiments involving SVMTorch, we have tested a version with shrinking but without verifying at the end of the optimization whether all the suppressed variables verified the KKT conditions (SVMTorch), with no shrinking (SVMTorchN), and a version with shrinking and verification at the end of the optimization, as done in SVM-Light (SVMTorchU). As it will be seen in the results, the first method has a big speedup advantage, but only a small negative impact on the generalization performance in general. However, sometimes the default value of 100 iterations before a variable is removed by shrinking must be changed to obtain the correct solution.

Next: Working Set Size Up: SVMTorch: Support Vector Machines Previous: Convergence

Journal of Machine Learning Research