Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Marcus Hutter; 4(Nov):971-1000, 2003.
Abstract
Various optimality properties of universal sequence predictors
based on Bayes-mixtures in general, and Solomonoff's prediction
scheme in particular, will be studied.
The probability of observing
xt at time
t, given past
observations
x1...
xt-1 can be computed with the chain rule
if the true generating distribution μ of the sequences
x1x2x3.... is known. If μ is unknown, but known to belong
to a countable or continuous class Μ one can base ones
prediction on the Bayes-mixture ξ defined as a
wν-weighted sum or integral of distributions ν
∈ Μ. The
cumulative expected loss of the Bayes-optimal universal prediction
scheme based on ξ is shown to be close to the loss of the
Bayes-optimal, but infeasible prediction scheme based on μ. We
show that the bounds are tight and that no other predictor can
lead to significantly smaller bounds.
Furthermore, for various performance measures, we show
Pareto-optimality of ξ and give an Occam's razor argument that
the choice
wν &sim 2
-K(ν) for the weights is optimal,
where
K(ν) is the length of the shortest program describing
ν.
The results are applied to games of chance, defined as a sequence
of bets, observations, and rewards.
The prediction schemes (and bounds) are compared to the popular
predictors based on expert advice.
Extensions to infinite alphabets, partial, delayed and
probabilistic prediction, classification, and more active systems
are briefly discussed.
[abs][pdf][ps.gz][ps]