## Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging

*Shusen Wang, Alex Gittens, Michael W. Mahoney*; 18(218):1−50, 2018.

### Abstract

We address the statistical and optimization impacts of the
classical sketch and Hessian sketch used to approximately solve
the Matrix Ridge Regression (MRR) problem. Prior research has
quantified the effects of classical sketch on the strictly
simpler least squares regression (LSR) problem. We establish
that classical sketch has a similar effect upon the optimization
properties of MRR as it does on those of LSR: namely, it
recovers nearly optimal solutions. By contrast, Hessian sketch
does not have this guarantee; instead, the approximation error
is governed by a subtle interplay between the “mass” in the
responses and the optimal objective value. For both types of
approximation, the regularization in the sketched MRR problem
results in significantly different statistical properties from
those of the sketched LSR problem. In particular, there is a
bias-variance trade-off in sketched MRR that is not present in
sketched LSR. We provide upper and lower bounds on the bias and
variance of sketched MRR; these bounds show that classical
sketch significantly increases the variance, while Hessian
sketch significantly increases the bias. Empirically, sketched
MRR solutions can have risks that are higher by an order-of-
magnitude than those of the optimal MRR solutions. We establish
theoretically and empirically that model averaging greatly
decreases the gap between the risks of the true and sketched
solutions to the MRR problem. Thus, in parallel or distributed
settings, sketching combined with model averaging is a powerful
technique that quickly obtains near-optimal solutions to the MRR
problem while greatly mitigating the increased statistical risk
incurred by sketching.

[abs][pdf][bib]