## Hierarchically Compositional Kernels for Scalable Nonparametric Learning

*Jie Chen, Haim Avron, Vikas Sindhwani*; 18(66):1−42, 2017.

### Abstract

We propose a novel class of kernels to alleviate the high
computational cost of large-scale nonparametric learning with
kernel methods. The proposed kernel is defined based on a
hierarchical partitioning of the underlying data domain, where
the NystrÃ¶m method (a globally low-rank approximation) is
married with a locally lossless approximation in a hierarchical
fashion. The kernel maintains (strict) positive-definiteness.
The corresponding kernel matrix admits a recursively off-
diagonal low-rank structure, which allows for fast linear
algebra computations. Suppressing the factor of data dimension,
the memory and arithmetic complexities for training a regression
or a classifier are reduced from $O(n^2)$ and $O(n^3)$ to
$O(nr)$ and $O(nr^2)$, respectively, where $n$ is the number of
training examples and $r$ is the rank on each level of the
hierarchy. Although other randomized approximate kernels entail
a similar complexity, empirical results show that the proposed
kernel achieves a matching performance with a smaller $r$. We
demonstrate comprehensive experiments to show the effective use
of the proposed kernel on data sizes up to the order of
millions.

[abs][pdf][bib]