Generalized Optimal Reverse Prediction
Martha White, Dale Schuurmans ; JMLR W&CP 22: 1305-1313, 2012.
Recently it has been shown that classical supervised and unsupervised training methods can be unified as special cases of so-called "optimal reverse prediction": predicting inputs from target labels while optimizing over both model parameters and missing labels. Although this perspective establishes links between classical training principles, the existing formulation only applies to linear predictors under squared loss, hence is extremely limited. We generalize the formulation of optimal reverse prediction to arbitrary Bregman divergences, and more importantly to nonlinear predictors. This extension is achieved by establishing a new, generalized form of forward-reverse minimization equivalence that holds for arbitrary matching losses. Several benefits follow. First, a new variant of Bregman divergence clustering can be recovered that incorporates a non-linear data reconstruction model. Second, normalized-cut and kernel-based extensions can be formulated coherently. Finally, a new semi-supervised training principle can be recovered for classification problems that demonstrates advantages over the state of the art.