next up previous
Next: Small Datasets Up: Experimental Results Previous: Experimental Results

Working Set Size

Using the first 10000 examples of each dataset (or 6192 for Kin which is too small), we trained different models using various values of q, from 2 to 100. We used a fixed cache size of 100Mb and turned on the shrinking, but did not use the sparse mode. The optimizer used to solve the subproblems of size q>2 was a conjugate gradient method with projection11. TABLE 2  gives the results of these experiments. It is clear that q=2 is always faster than any other value of q. Thus, in the following experiments, we have always used q=2.


 
Table 2: Training time (in seconds) as a function of the working set size, for non-sparse data.
  Working set size
  2 4 10 50 100
Kin 11 14 16 28 54
Artificial 98 149 190 629 1537
Forest 272 406 462 670 981
Sunspots 7 11 15 45 89
MNIST 573 664 829 1657 2213
 


next up previous
Next: Small Datasets Up: Experimental Results Previous: Experimental Results
Journal of Machine Learning Research