Compression-Based Averaging of Selective Naive Bayes Classifiers

Marc Boullé.

Year: 2007, Volume: 8, Issue: 58, Pages: 1659−1685


The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with "soft variable selection". Extensive experiments show that the compression-based averaged classifier outperforms the Bayesian model averaging scheme.