Kenneth Joseph Ryan, Mark Vere Culp.
Year: 2015, Volume: 16, Issue: 99, Pages: 3183−3217
Semi-supervised learning approaches are trained using the full training (labeled) data and available testing (unlabeled) data. Demonstrations of the value of training with unlabeled data typically depend on a smoothness assumption relating the conditional expectation to high density regions of the marginal distribution and an inherent missing completely at random assumption for the labeling. So-called covariate shift poses a challenge for many existing semi-supervised or supervised learning techniques. Covariate shift models allow the marginal distributions of the labeled and unlabeled feature data to differ, but the conditional distribution of the response given the feature data is the same. An example of this occurs when a complete labeled data sample and then an unlabeled sample are obtained sequentially, as it would likely follow that the distributions of the feature data are quite different between samples. The value of using unlabeled data during training for the elastic net is justified geometrically in such practical covariate shift problems. The approach works by obtaining adjusted coefficients for unlabeled prediction which recalibrate the supervised elastic net to compromise: (i) maintaining elastic net predictions on the labeled data with (ii) shrinking unlabeled predictions to zero. Our approach is shown to dominate linear supervised alternatives on unlabeled response predictions when the unlabeled feature data are concentrated on a low dimensional manifold away from the labeled data and the true coefficient vector emphasizes directions away from this manifold. Large variance of the supervised predictions on the unlabeled set is reduced more than the increase in squared bias when the unlabeled responses are expected to be small, so an improved compromise within the bias-variance tradeoff is the rationale for this performance improvement. Performance is validated on simulated and real data.