Stopping Criterion for Boosting-Based Data Reduction Techniques: from Binary to Multiclass Problem

Marc Sebban, Richard Nock, Stéphane Lallich; 3(Dec):863-885, 2002.

Abstract

So far, boosting has been used to improve the quality of moderately accurate learning algorithms, by weighting and combining many of their weak hypotheses into a final classifier with theoretically high accuracy. In a recent work (Sebban, Nock and Lallich, 2001), we have attempted to adapt boosting properties to data reduction techniques. In this particular context, the objective was not only to improve the success rate, but also to reduce the time and space complexities due to the storage requirements of some costly learning algorithms, such as nearest-neighbor classifiers. In that framework, each weak hypothesis, which is usually built and weighted from the learning set, is replaced by a single learning instance. The weight given by boosting defines in that case the relevance of the instance, and a statistical test allows one to decide whether it can be discarded without damaging further classification tasks. In Sebban, Nock and Lallich (2001), we addressed problems with two classes. It is the aim of the present paper to relax the class constraint, and extend our contribution to multiclass problems. Beyond data reduction, experimental results are also provided on twenty-three datasets, showing the benefits that our boosting-derived weighting rule brings to weighted nearest neighbor classifiers.

[abs] [pdf] [ps.gz] [ps]