Empirical Evaluation

Contrary to the theoretical analysis in the previous section, where we focussed on the ``friendly'' case of pairing unordered binarization vs. single round robin, our empirical results compare the performance of a double round robin binarization with RIPPER as a base learner against both ordered binarization (RIPPER's default mode) and unordered binarization (RIPPER with the parameters -a unordered). In the case of a linear algorithm complexity, the double round robin should be about two times slower than the unordered binarization and four times slower than the ordered binarization. As discussed in the previous section, RIPPER's super-linear run-time¹¹ might improve the relative performance of round robin learning.

**Table:** *Runtime results:* The first column shows the average run-times (in CPU secs. user time) of R3. The subsequent columns show the ratio of R3 over unordered and ordered RIPPER. The reported run-times are average training time over all folds of a 10-fold cross-validation. The line at the bottom shows the geometric average of the 17 cross-validated performance ratios.
dataset	R3	vs. unordered	vs. ordered
abalone	140.28	3.14	3.27
car	6.71	1.55	1.47
glass	2.03	2.26	3.80
image	25.84	0.90	1.98
lr spectrometer	489.67	4.40	6.93
optical	275.69	0.63	1.34
page-blocks	36.66	1.43	1.93
sat	186.89	0.69	1.25
solar flares (c)	6.65	6.03	7.57
solar flares (m)	3.98	5.62	7.49
soybean	21.07	6.29	13.24
thyroid (hyper)	19.71	2.68	3.46
thyroid (hypo)	14.91	2.39	3.63
thyroid (repl.)	15.35	2.26	3.33
vehicle	7.66	1.22	2.10
vowel	16.22	0.87	2.16
yeast	16.90	1.77	3.12
average		2.02	3.19

Table 3 shows the run times in CPU secs. user time (measured on a Sparc Ultra-2 under Sun Solaris) of R3 and its performance ratios against RIPPER in unordered and ordered mode. The reported times are average training times,¹²i.e., the performance ratios can be interpreted as empirical estimates of the class penalty ratios $\frac{\pi_r}{\pi_{u}}$ and $\frac{\pi_r}{\pi_{o}}$ . On average, R3 is about two times slower than RIPPER in unordered mode, and about three times slower than RIPPER in default, ordered mode, while RIPPER's ordered mode is about 1.5 times faster than unordered mode (as opposed to the factor 2 we were assuming in the theoretical analysis at the end of Section 5.1). This means that despite the inefficient implementation, the empirical values are fairly close to the estimates we made at the end of the previous section: for a linear algorithm, we expected the double round-robin procedure to be about twice as slow as the unordered approach and about four times as slow than the ordered approach. Apparently, the additional savings--based on the fact that simpler concepts are learned for the pairwise tasks and that RIPPER's run-time is super-linear--make up for the losses due to the sub-optimal implementation as a wrapper. For more expensive base learning algorithms (like support vector machines), the analysis in the previous section lets us expect bigger savings.

Moreover, there are several cases where R3 is even faster than RIPPER in unordered mode and comes close to RIPPER in ordered mode. This is despite the fact that R3 is implemented as a perl-script that communicates to RIPPER by writing the training and test sets of the new tasks to the disk, while unordered and ordered binarization are native in RIPPER's efficient implementation in C. Clearly, a tight integration of round robin binarization into RIPPER's C-code would be more efficient.¹³