## Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

*Braxton Osting, Christoph Brune, Stanley J. Osher*; 15(Oct):2981−3012, 2014.

### Abstract

Given a graph where vertices represent alternatives and arcs
represent pairwise comparison data, the statistical ranking
problem is to find a potential function, defined on the
vertices, such that the gradient of the potential function
agrees with the pairwise comparisons. Our goal in this paper is
to develop a method for collecting data for which the least
squares estimator for the ranking problem has maximal Fisher
information. Our approach, based on experimental design, is to
view data collection as a bi-level optimization problem where
the inner problem is the ranking problem and the outer problem
is to identify data which maximizes the informativeness of the
ranking. Under certain assumptions, the data collection problem
decouples, reducing to a problem of finding multigraphs with
large algebraic connectivity. This reduction of the data
collection problem to graph-theoretic questions is one of the
primary contributions of this work. As an application, we study
the Yahoo! Movie user rating data set and demonstrate that the
addition of a small number of well-chosen pairwise comparisons
can significantly increase the Fisher informativeness of the
ranking. As another application, we study the 2011-12 NCAA
football schedule and propose schedules with the same number of
games which are significantly more informative. Using spectral
clustering methods to identify highly-connected communities
within the division, we argue that the NCAA could improve its
notoriously poor rankings by simply scheduling more out-of-
conference games.

[abs][pdf][bib]