Stopping Criterion for Boosting-Based Data Reduction Techniques: from Binary to Multiclass Problem
Marc Sebban, Richard Nock, Stéphane Lallich;
3(Dec):863-885, 2002.
Abstract
So far, boosting has been used to improve the quality of moderately accurate learning algorithms, by
weighting and combining many of their
weak hypotheses into a final classifier with theoretically
high accuracy. In a recent work (Sebban, Nock and Lallich, 2001), we have attempted to adapt boosting
properties to data reduction techniques. In this particular context, the objective was not only to improve
the success rate, but also to reduce the time and space complexities due to the storage requirements of
some costly learning algorithms, such as nearest-neighbor
classifiers. In that framework, each
weak hypothesis, which is usually built and weighted from the learning set, is replaced by a
single learning instance. The weight given by boosting defines in that case the relevance of the instance,
and a statistical test allows one to decide whether it can be discarded without damaging further classification
tasks. In Sebban, Nock and Lallich (2001), we addressed problems with two classes. It is the aim of the
present paper to relax the class constraint, and extend our contribution to multiclass problems. Beyond
data reduction, experimental results are also provided on twenty-three datasets, showing the benefits that
our boosting-derived weighting rule brings to weighted nearest neighbor classifiers.
[abs]
[pdf]
[ps.gz]
[ps]