Concentration Inequalities for the Missing Mass and for Histogram Rule Error
David McAllester, Luis Ortiz; 4(Oct):895-911, 2003.
Abstract
This paper gives distribution-free concentration
inequalities for the missing mass and the error rate of histogram rules.
Negative association methods can be used to reduce these concentration
problems to concentration questions about independent sums. Although
the sums are independent, they are highly heterogeneous. Such
highly heterogeneous independent sums cannot be analyzed using standard
concentration inequalities such as Hoeffding's inequality, the
Angluin-Valiant bound, Bernstein's inequality, Bennett's inequality,
or McDiarmid's theorem. The concentration inequality for histogram rule error
is motivated by the desire to construct a new class of bounds on the generalization
error of decision trees.
[abs][pdf][ps.gz][ps]