The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces

Sugiyama, Masashi; Müller, Klaus-Robert

The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces

Masashi Sugiyama, Klaus-Robert Müller; 3(Nov):323-359, 2002.

Abstract

A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of model selection and generalization error estimation in the context of kernel regression models, e.g., kernel ridge regression, kernel subset regression or Gaussian process regression. Previously, a non-asymptotic generalization error estimator called the subspace information criterion (SIC) was proposed, that could be successfully applied to finite dimensional subspace models. SIC is an unbiased estimator of the generalization error for the finite sample case under the conditions that the learning target function belongs to a specified reproducing kernel Hilbert space (RKHS) H and the reproducing kernels centered on training sample points span the whole space H. These conditions hold only if dim H < l, where l < infinity is the number of training examples. Therefore, SIC could be applied only to finite dimensional RKHSs. In this paper, we extend the range of applicability of SIC, and show that even if the reproducing kernels centered on training sample points do not span the whole space H, SIC is an unbiased estimator of an essential part of the generalization error. Our extension allows the use of any RKHSs including infinite dimensional ones, i.e., richer function classes commonly used in Gaussian processes, support vector machines or boosting. We further show that when the kernel matrix is invertible, SIC can be expressed in a much simpler form, making its computation highly efficient. In computer simulations on ridge parameter selection with real and artificial data sets, SIC is compared favorably with other standard model selection techniques for instance leave-one-out cross-validation or an empirical Bayesian method.

[abs] [pdf] [ps.gz] [ps]