Feature Discovery in Non-Metric Pairwise Data
Julian Laub, Klaus-Robert Müller; 5(Jul):801-818, 2004.
Abstract
Pairwise proximity data, given as similarity or dissimilarity matrix,
can violate metricity. This occurs either due to
noise, fallible estimates, or due to intrinsic non-metric features
such as they arise from human judgments. So far the problem of
non-metric pairwise data has been tackled by essentially omitting
the negative eigenvalues or shifting the spectrum of the associated
(pseudo-)covariance matrix for a subsequent embedding. However,
little attention has been paid to the negative part of the spectrum
itself. In particular no answer was given to whether the directions
associated to the negative eigenvalues would at all code variance other
than noise related. We show by a simple,
exploratory analysis
that the negative eigenvalues
can code for
relevant structure in the data, thus leading to the discovery of new
features, which were lost by conventional data analysis techniques.
The information hidden in the negative eigenvalue part of the
spectrum is illustrated and discussed for three data sets, namely USPS
handwritten digits, text-mining and data from cognitive psychology.
[abs][pdf]
[ps.gz]
[ps]