Feature Extraction by Non-Parametric Mutual Information Maximization
Kari Torkkola;
3(Mar):1415-1438, 2003.
Abstract
We present a method for learning discriminative feature transforms
using as criterion the mutual information between class labels and
transformed features. Instead of a commonly used mutual
information measure based on Kullback-Leibler divergence, we use a
quadratic divergence measure, which allows us to make an efficient
non-parametric implementation and requires no prior assumptions
about class densities. In addition to linear transforms, we also
discuss nonlinear transforms that are implemented as radial basis
function networks. Extensions to reduce the computational
complexity are also presented, and a comparison to greedy feature
selection is made.
[abs]
[pdf]
[ps.gz]
[ps]
[demos]