Semi-Supervised Interpolation in an Anticausal Learning Scenario

Dominik Janzing; Bernhard Schölkopf

According to a recently stated 'independence postulate', the distribution $P_{\rm cause}$ contains no information about the conditional $P_{\rm effect | cause}$ while $P_{\rm effect}$ may contain information about $P_{\rm cause | effect}$. Since semi- supervised learning (SSL) attempts to exploit information from $P_X$ to assist in predicting $Y$ from $X$, it should only work in anticausal direction, i.e., when $Y$ is the cause and $X$ is the effect. In causal direction, when $X$ is the cause and $Y$ the effect, unlabelled $x$-values should be useless. To shed light on this asymmetry, we study a deterministic causal relation $Y=f(X)$ as recently assayed in Information-Geometric Causal Inference (IGCI). Within this model, we discuss two options to formalize the independence of $P_X$ and $f$ as an orthogonality of vectors in appropriate inner product spaces. We prove that unlabelled data help for the problem of interpolating a monotonically increasing function if and only if the orthogonality conditions are violated -- which we only expect for the anticausal direction. Here, performance of SSL and its supervised baseline analogue is measured in terms of two different loss functions: first, the mean squared error and second the surprise in a Bayesian prediction scenario.

Semi-Supervised Interpolation in an Anticausal Learning Scenario

Abstract