Large Datasets

Table 4: Experiments on large training sets. See TABLE 3 for the description of the fields.

Dataset	Model	Time			Objective	Model		Median
		NSP	SP	# SV	Function	Train	Test	Train	Test
	SVMTorch	11	-	1140	-212439.78	0.30	0.31
Kin	SVMTorchU	32	-	1140	-212439.78	0.30	0.31	0.37	0.38
	SVMTorchN	86	-	1140	-212439.78	0.30	0.31
	Nodelib	273	-	1138	-212478.38	0.30	0.31
	SVMTorch	235	-	706	-39569.14	0.21	0.34
Artificial	SVMTorchU	4394	-	817	-40025.98	0.20	0.33	27.29	14.25
	SVMTorchN	9182	-	824	-40016.55	0.20	0.34
	Nodelib	2653	-	764	-40043.94	0.20	0.33
	SVMTorch	4573	4392	3019	-56266.94	1.63	1.82
Forest	SVMTorchU	40669	37769	4080	-78297.27	0.40	0.93	0.81	1.59
	SVMTorchN	79237	73045	4233	-78294.56	0.39	0.93
	Nodelib	87133	-	4088	-78384.15	0.39	0.93
	SVMTorch	67	-	1771	-11215476.03	8.97	12.72
Sunspots	SVMTorchU	1290	-	1822	-11229107.83	8.96	12.59	33.02	52.57
	SVMTorchN	2606	-	1820	-11229098.49	8.96	12.59
	Nodelib	24022	-	1818	-11229124.45	8.96	12.59
	SVMTorch	9874	6460	8532	-1289.54	0.25	0.27
MNIST	SVMTorchU	33644	21482	8642	-1290.66	0.25	0.27	0.98	0.97
	SVMTorchN	32095	20951	8634	-1290.57	0.25	0.27
	Nodelib	> 10⁶	-	-	-	-	-

Let us now turn to experiments using large datasets. TABLE 4 shows the results using the whole training sets for all datasets, again using a cache size of 300Mb. Since the problems are now too big to be kept in memory, the implementation of the cache becomes very important and comparisons of the algorithms used in SVMTorch and Nodelib become more difficult. Nevertheless, it is clear that SVMTorch is always faster, except again for Artificial in the cases with no shrinking or with unshrinking, but the performance on the test sets is similar. However, note that shrinking sometimes leads to very poor results in terms of test set performance, as is the case on Forest. It is thus clear that shrinking should be used with care, particularly for large datasets, and the parameter that decides when to eliminate a variable should be tuned carefully before running a series of experiments on the same dataset. Note also that Nodelib was not able to solve MNIST after 11 days.