Data-dependent margin-based generalization bounds for classification
András Antos, Balázs Kégl, Tamás Linder, Gábor Lugosi;
3(Jul):73-98, 2002.
Abstract
We derive new margin-based inequalities for the probability of error
of classifiers. The main feature of these bounds is that they can be
calculated using the training data and therefore may be effectively
used for model selection purposes. In particular, the bounds involve
empirical complexities measured on the training data (such as the
empirical fat-shattering dimension) as opposed to their worst-case
counterparts traditionally used in such analyses. Also, our bounds
appear to be sharper and more general than recent results involving
empirical complexity measures. In addition, we develop an
alternative data-based bound for the generalization error of classes
of convex combinations of classifiers involving an empirical
complexity measure that is easier to compute than the empirical
covering number or fat-shattering dimension. We also show examples of
efficient computation of the new bounds.
[abs]
[pdf]
[ps.gz]
[ps]