Boosting as a Regularized Path to a Maximum Margin Classifier
Saharon Rosset, Ji Zhu, Trevor Hastie; 5(Aug):941-973, 2004.
Abstract
In this paper we study boosting methods from a new perspective. We
build on recent work by Efron et al. to show that boosting
approximately (and in some cases exactly) minimizes its loss
criterion with an
l1 constraint on the coefficient
vector. This
helps understand the success of boosting with early stopping as
regularized fitting of the loss criterion. For the two most
commonly used criteria (exponential and binomial log-likelihood),
we further show that as the constraint is relaxed---or
equivalently as the boosting iterations proceed---the solution
converges (in the separable case) to an "
l1-optimal"
separating hyper-plane. We prove that this
l1-optimal
separating hyper-plane has the property of maximizing the minimal
l1-margin of the training data, as defined in the boosting
literature. An interesting fundamental similarity between boosting
and kernel support vector machines emerges, as both can be
described as methods for regularized optimization in
high-dimensional predictor space, using a computational trick to
make the calculation practical, and converging to
margin-maximizing solutions. While this statement describes SVMs
exactly, it applies to boosting only approximately.
[abs][pdf]
[ps.gz]
[ps]