## Classifying With Confidence From Incomplete Information

*Nathan Parrish, Hyrum S. Anderson, Maya R. Gupta, Dun Yu Hsiao*; 14(Dec):3561−3589, 2013.

### Abstract

We consider the problem of classifying a test sample given
incomplete information. This problem arises naturally when data
about a test sample is collected over time, or when costs must
be incurred to compute the classification features. For example,
in a distributed sensor network only a fraction of the sensors
may have reported measurements at a certain time, and additional
time, power, and bandwidth is needed to collect the complete
data to classify. A practical goal is to assign a class label as
soon as enough data is available to make a good decision. We
formalize this goal through the notion of reliability---the
probability that a label assigned given incomplete data would be
the same as the label assigned given the complete data, and we
propose a method to classify incomplete data only if some
reliability threshold is met. Our approach models the complete
data as a random variable whose distribution is dependent on the
current incomplete data and the (complete) training data. The
method differs from standard imputation strategies in that our
focus is on determining the reliability of the classification
decision, rather than just the class label. We show that the
method provides useful reliability estimates of the correctness
of the imputed class labels on a set of experiments on time-
series data sets, where the goal is to classify the time-series
as early as possible while still guaranteeing that the
reliability threshold is met.

[abs][pdf][bib]