Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space
Simon Perkins, Kevin Lacker, James Theiler;
3(Mar):1333-1356, 2003.
Abstract
We present a novel and flexible approach to the problem of feature
selection, called
grafting. Rather than considering feature
selection as separate from learning, grafting treats the selection of
suitable features as an integral part of learning a predictor in a
regularized learning framework. To make this regularized learning
process sufficiently fast for large scale problems, grafting operates
in an incremental iterative fashion, gradually building up a feature
set while training a predictor model using gradient descent. At each
iteration, a fast gradient-based heuristic is used to quickly assess
which feature is most likely to improve the existing model, that
feature is then added to the model, and the model is incrementally
optimized using gradient descent. The algorithm scales linearly with
the number of data points and at most quadratically with the number of
features. Grafting can be used with a variety of predictor model
classes, both linear and non-linear, and can be used for both
classification and regression. Experiments are reported here on a
variant of grafting for classification, using both linear and
non-linear models, and using a logistic regression-inspired loss
function. Results on a variety of synthetic and real world data sets
are presented. Finally the relationship between grafting, stagewise
additive modelling, and boosting is explored.
[abs]
[pdf]
[ps.gz]
[ps]
[data]