The em Algorithm for Kernel Matrix Completion with Auxiliary Data
Koji Tsuda, Shotaro Akaho, Kiyoshi Asai; 4(May):67-81, 2003.
Abstract
In biological data, it is often the case that observed data are
available only for a subset of samples. When a kernel matrix is
derived from such data, we have to leave the entries for unavailable
samples as missing. In this paper, the missing entries are completed
by exploiting an auxiliary kernel matrix derived from another
information source. The parametric model of kernel matrices is
created as a set of spectral variants of the auxiliary kernel matrix,
and the missing entries are estimated by fitting this model to the
existing entries. For model fitting, we adopt the
em algorithm
(distinguished from the EM algorithm of Dempster et al., 1977) based
on the information geometry of positive definite matrices. We will
report promising results on bacteria clustering experiments using two
marker sequences: 16S and gyrB.
[abs][pdf][ps.gz][ps]