Feature Selection for Unsupervised Learning
Jennifer G. Dy, Carla E. Brodley; 5(Aug):845--889, 2004.
Abstract
In this paper, we identify two issues involved in developing an
automated feature subset selection algorithm for unlabeled data:
the need for finding the number of clusters in conjunction with
feature selection, and the need for normalizing the bias of fe
ature selection criteria with respect to dimension.
We explore the feature selection problem and these issues through FSSEM
(Feature Subset Selection using Expectation-Maximization (EM) clustering)
and through two different performance criteria for evaluating candidate
feature subsets: scatter separability and maximum likelihood.
We present proofs on the dimensionality biases of these feature
criteria, and present a cross-projection normalization scheme that
can be applied to any criterion to ameliorate these biases.
Our experiments show the need for feature selection, the need for addressing
these two issues, and the effectiveness of our proposed solutions.
[abs][pdf][ps.gz][ps]