Round Robin Learning as an Ensemble Technique

Ensemble techniques have received considerable attention within the recent machine learning literature (Bauer and Kohavi, 1999; Dietterich, 2000a;1997; Opitz and Maclin, 1999). The idea to obtain a diverse set of classifiers for a single learning problem and to vote or average their predictions is both simple and powerful, and the obtained accuracy gains often have a sound theoretical foundation (Breiman, 1996; Freund and Schapire, 1997). Averaging the predictions of these classifiers helps to reduce the variance and often increase the reliability of the predictions. There are several techniques for obtaining a diverse set of classifiers. The most common technique is to use subsampling to diversify the training sets as in bagging (Breiman, 1996) and boosting (Freund and Schapire, 1997). Other techniques include the use of different feature subsets (Bay, 1999), to exploit the randomness of the base algorithms (Kolen and Pollack, 1991), possibly by artificially randomizing their behavior (Dietterich, 2000b), or to use multiple representations of the domain objects, for example by using information originating from different hyperlinks pointing to a web page (Fürnkranz, 2001a; Fürnkranz, 1999a). Finally, classifier diversity can be ensured by modifying the output labels, i.e., by transforming the learning tasks into a collection of related learning tasks that use the same input examples but a different assignments of the class labels. Error-correcting output codes are the most prominent example for this type of ensemble method (Dietterich and Bakiri, 1995).

Clearly, round robin classification may also be interpreted as a member of this last group, and its performance gain may be seen in this context. Obviously, the final prediction is made by exploiting the redundancy provided by multiple models, each of them being constructed from a subset of the original data. However, contrary to subsampling approaches like bagging and boosting, these datasets are constructed deterministically.¹⁴ In this respect pairwise classification shares more similarities with error-correcting output codes, but differs from it through the fixed procedure for setting up the new binary problems and the fact that each of the new problems is smaller than the original problem. In particular the latter fact may often cause the sub-problems in pairwise classification to be conceptually simpler than the original problem (as illustrated in Figure 1).

In previous work (Fürnkranz, 2001b), we observed that the improvements in accuracy obtained by R3 over RIPPER were quite similar to those obtained by C5.0-BOOST over C5.0 on the same problems. Round robin binarization seemed to work whenever boosting worked, and vice versa. The correlation coefficient

of the the error ratios of C5.0-BOOST/C5.0 and R3/RIPPER on the 20 datasets was about 0.618, which is in the same range as correlation coefficients for bagging and boosting (Opitz and Maclin, 1999). We interpreted this as weak evidence that the performance gains of round robin learning may be comparable to that of other ensemble methods and that it could be used as a general method for improving a learner's performance on multi-class problems. We will further investigate this question in this section and will in particular focus upon a comparison of round robin learning with boosting (Section 6.1) and bagging (Section 6.2), and upon the potential of combining it with these techniques.

**Table:** *Boosting:* A comparison between round robin binarization and boosting, both with C5.0 as a base learner. The first column shows the results of C5.0, while the next three column pairs show the results of round robin learning, boosting, and the combination of both, all with C5.0 as a base learner. For these, we give both the absolute error rate and the performance ratio relative to the base learner C5.0. The last line shows the geometric average of these ratios.
dataset	C5.0	round robin		boosting		both
abalone	78.48	75.08	0.957	77.88	0.992	74.67	0.951
car	7.58	5.84	0.771	3.82	0.504	1.85	0.244
glass	35.05	24.77	0.707	27.57	0.787	22.90	0.653
image	3.20	2.90	0.905	1.60	0.500	1.73	0.541
lr spectrometer	51.22	51.79	1.011	46.70	0.912	51.98	1.015
optical	9.20	5.04	0.547	2.46	0.267	2.54	0.277
page-blocks	3.09	2.98	0.964	2.58	0.834	2.78	0.899
sat	13.82	13.16	0.953	9.32	0.675	9.00	0.651
solar flares (c)	15.77	15.69	0.995	16.41	1.041	16.70	1.059
solar flares (m)	4.90	4.90	1.000	5.90	1.206	5.83	1.191
soybean	9.66	6.73	0.697	6.59	0.682	6.44	0.667
thyroid (hyper)	1.11	1.14	1.024	1.03	0.929	1.33	1.190
thyroid (hypo)	0.58	0.69	1.182	0.32	0.545	0.53	0.909
thyroid (repl.)	0.72	0.74	1.037	0.90	1.259	0.90	1.259
vehicle	26.24	29.20	1.113	24.11	0.919	23.17	0.883
vowel	21.72	19.49	0.898	8.89	0.409	14.75	0.679
yeast	43.26	40.63	0.939	41.85	0.967	40.77	0.942
average			0.909		0.735		0.757