## Fast Approximation of Matrix Coherence and Statistical Leverage

** Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, David P. Woodruff**; 13(111):3475−3506, 2012.

### Abstract

The *statistical leverage scores* of a matrix *A* are the squared row-norms of the matrix containing its (top) left singular vectors and the *coherence* is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary *n × d* matrix *A*, with *n >> d*, and that returns as output relative-error approximations to *all* *n* of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of *n* and *d*) in *O(n d log n)* time, as opposed to the *O(nd ^{2})* time required by the naïve algorithm that involves computing an orthogonal basis for the range of

*A*. Our analysis may be viewed in terms of computing a relative-error approximation to an

*under*constrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with

*n ≈ d*, and the extension to streaming environments.

© JMLR 2012. (edit, beta) |