Small Datasets

Let us now compare SVMTorch and Nodelib on small datasets. In the results given in TABLE 3, only the first 5000 training examples were used. The size of the cache was set to 300Mb, so that the whole kernel matrix could be kept in memory. For each problem, we also computed the output median (which minimizes the mean absolute error (MAE) without any information about the inputs) on the training set and computed the performance of this median over the training and the test sets in order to have a better idea of the performance of the various algorithms.

Table 3: Experiments on small training sets. SVMTorchN = SVMTorch without shrinking, SVMTorchU = SVMTorch with shrinking and unshrinking, Time NSP = time (in seconds) for non-sparse data format, Time SP = time (in seconds) for sparse data format, # SV = number of support vectors, Objective Function = value of (3) at the end of the optimization, Model Train = mean absolute error (MAE) over the training set, Model Test = MAE over the test set, Median Train = MAE over the training set with the median as predictor, Median Test = MAE over the test set with the median as predictor.

Dataset	Model	Time			Objective	Model		Median
		NSP	SP	# SV	Function	Train	Test	Train	Test
	SVMTorch	7	-	936	-173977.85	0.30	0.31
Kin	SVMTorchU	15	-	936	-173977.85	0.30	0.31	0.37	0.38
	SVMTorchN	45	-	941	-173982.65	0.30	0.31
	Nodelib	157	-	932	-174019.67	0.30	0.31
	SVMTorch	31	-	342	-13594.01	0.25	0.52
Artificial	SVMTorchU	166	-	367	-13703.07	0.24	0.51	25.16	27.95
	SVMTorchN	448	-	370	-13701.41	0.24	0.51
	Nodelib	231	-	342	-13707.29	0.24	0.51
	SVMTorch	21	21	993	-1012.46	0.51	0.80
Forest	SVMTorchU	65	43	1051	-1030.78	0.41	0.80	0.38	1.59
	SVMTorchN	110	115	1058	-1030.43	0.41	0.80
	Nodelib	542	-	1032	-1031.54	0.41	0.80
	SVMTorch	2	-	420	-3489571.13	9.65	10.12
Sunspots	SVMTorchU	9	-	422	-3489630.53	9.65	10.11	33.02	52.58
	SVMTorchN	38	-	422	-3489628.27	9.64	10.11
	Nodelib	327	-	422	-3489630.65	9.64	10.11
	SVMTorch	118	79	1861	-190.77	0.39	0.48
MNIST	SVMTorchU	147	98	1861	-190.77	0.39	0.48	0.98	0.97
	SVMTorchN	152	104	1861	-190.77	0.39	0.48
	Nodelib	2216	-	1878	-190.80	0.39	0.48

As can be seen, for all the datasets, SVMTorch is usually many times faster than Nodelib (except for Artificial in the case of SVMTorch without shrinking). Since the whole matrix of the quadratic problem was in memory, handling of the cache had no effect on the speed results. Thus one can conclude that one of the main differences between these algorithms is the selection of the subproblem, and that selecting the $\alpha_i$ independently of the $\alpha_i^{\star}$ is very efficient. However, Nodelib gave slightly better results in terms of the objective function (probably due to the fact that their termination criterion is stronger than ours; see [1] for a comparison between termination criteria), but not in terms of test error. Note that for problems of this size, shrinking does not cause the performance to deteriorate too much (check the values of the objective function as well as the training and test set performances) but does speed the algorithm up a lot. Note also that using the sparse format was not good in the case of Forest but was good for MNIST, which has a higher input dimension: this means that the cost induced by the use of sparse format is balanced by the gain obtained in the kernel computation only when the input size is large enough.