## On the Propagation of Low-Rate Measurement Error to Subgraph Counts in Large Networks

*Prakash Balachandran, Eric D. Kolaczyk, Weston D. Viles*; 18(61):1−33, 2017.

### Abstract

Our work in this paper is inspired by a statistical observation
that is both elementary and broadly relevant to network analysis
in practice---that the uncertainty in approximating some true
graph $G=(V,E)$ by some estimated graph $\hat{G}=(V,\hat{E})$
manifests as errors in our knowledge of the presence/absence of
edges between vertex pairs, which must necessarily propagate to
any estimates of network summaries $\eta(G)$ we seek. Motivated
by the common practice of using plug-in estimates
$\eta(\hat{G})$ as proxies for $\eta(G)$, our focus is on the
problem of characterizing the distribution of the discrepancy
$D=\eta(\hat{G}) - \eta(G)$, in the case where $\eta(\cdot)$ is
a subgraph count. Specifically, we study the fundamental case
where the statistic of interest is $|E|$, the number of edges in
$G$. Our primary contribution in this paper is to show that in
the empirically relevant setting of large graphs with low-rate
measurement errors, the distribution of $D_E=|\hat{E}| - |E|$ is
well-characterized by a Skellam distribution, when the errors
are independent or weakly dependent. Under an assumption of
independent errors, we are able to further show conditions under
which this characterization is strictly better than that of an
appropriate normal distribution. These results derive from our
formulation of a general result, quantifying the accuracy with
which the difference of two sums of dependent Bernoulli random
variables may be approximated by the difference of two
independent Poisson random variables, i.e., by a Skellam
distribution. This general result is developed through the use
of Stein's method, and may be of some general interest. We
finish with a discussion of possible extension of our work to
subgraph counts $\eta(G)$ of higher order.

[abs][pdf][bib]