This paper has investigated the use of round robin binarization (or pairwise classification) as a technique for handling multi-class problems with separate-and-conquer rule learning algorithms (aka covering algorithms). Our experimental results show that, in comparison to conventional ordered or unordered binarization, the round robin approach may yield significant gains in accuracy without risking a bad performance. In particular, round robin binarization helps RIPPER to outperform C5.0 on multi-class problems, whereas C5.0 outperforms the original version of RIPPER on the same problems. We think that the reason for this improvement lies on the one hand in the exploitation of diverse predictions in an ensemble of classifiers and, on the other hand, in the fact that the resulting binary problems may be considerably simpler and can thus be learned more reliably (Figure 1).
Moreover, we demonstrated both empirically and theoretically that the resulting quadratic growth in the number of learning problems is compensated by the reduction in size for each of the individual problems. For base algorithms with linear or super-linear run-time, round robin binarization is faster than the conventional unordered technique. Furthermore, we have proven that this advantage increases with the complexity of the base algorithm. Note that these theoretical results are not restricted to rule learning, but are also applicable to perceptrons, support vector machines, boosting, and other learning algorithms that need binarization techniques for solving multi-class problems. Our experimental results confirm the efficiency of round robin binarization for rule learning, but for a true evaluation of the performance of this technique, an efficient, tight integration into a separate-and-conquer rule learning algorithm would be needed.
Finally, we investigated the properties of round robin learning as a general ensemble technique, in particular in comparison to bagging and boosting. For C5.0 and even less so for C5.0-BOOST, round robin classification did not seem to work as well as it did for RIPPER, which is consistent with previous similar results on error-correcting output codes. Similarly, a straightforward integration of pairwise classification with bagging outperformed both constituent techniques, when using RIPPER (and to some extent when using C5.0) as a base learner, but not in combination with boosting. It remains to be seen whether round robin learning does not work well for boosting in general, or whether this is a specific phenomenon for C5.0 and its boosting procedure. In any case, we can conclude that round robin classification provides a more efficient and more accurate alternative to the class binarization procedures that are currently in use in inductive rule learning.