Information Theoretic Model Selection for Pattern
J.M. Buhmann, M.H. Chehreghani, M. Frank
; JMLR W&CP 27:51–64, 2012.
Exploratory data analysis requires (i) to deﬁne a set of patterns
exist in the data, (ii) to specify a suitable quantiﬁcation principle
or cost function to rank these
patterns and (iii) to validate the inferred patterns. For data
clustering, the patterns
are object partitionings into k
for PCA or truncated SVD, the patterns are
orthogonal transformations with projections to a low-dimensional space.
We propose an
information theoretic principle for model selection and model-order
selection. Our principle
ranks competing pattern cost functions according to their ability to
sensitive information from noisy data with respect to the chosen
hypothesis class. Sets of
approximative solutions serve as a basis for a communication protocol.
Analogous to ?
inferred models maximize the so-called approximation capacity that is
information between coarsened training data patterns and coarsened test
data patterns. We
demonstrate how to apply our validation framework by the well-known
model and by a multi-label clustering approach for role mining in
binary user privilege