Causal & Non-Causal Feature Selection for Ridge Regression
Gavin C. Cawley; JMLR W&CP 3:107-128,
2009.
Abstract
In this paper we investigate the use of causal and non-causal feature selection methods
for linear classifiers in situations where the causal relationships between the input and response variables may differ between the training and operational data. The causal feature
selection methods investigated include inference of the Markov Blanket and inference of
direct causes and of direct effects. The non-causal feature selection method is based on
logistic regression with Bayesian regularisation using a Laplace prior. A simple ridge regression model is used as the base classifier, where the ridge parameter is efficiently tuned
so as to minimise the leave-one-out error, via eigen-decomposition of the data covariance
matrix. For tasks with more features than patterns, linear kernel ridge regression is used
for computational efficiency. Results are presented for all of the WCCI-2008 Causation and
Prediction Challenge datasets, demonstrating that, somewhat surprisingly, causal feature
selection procedures do not provide significant benefits in terms of predictive accuracy over
non-causal feature selection and/or classification using the entire feature set.