Si Si, Cho-Jui Hsieh, Inderjit S. Dhillon.
Year: 2017, Volume: 18, Issue: 20, Pages: 1−32
Scaling kernel machines to massive data sets is a major challenge due to storage and computation issues in handling large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a low-rank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shift-invariant kernels changes from low-rank to block-diagonal (without any low-rank structure) when varying the scale parameter. Based on this observation, we propose a new kernel approximation framework -- Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms state-of-the-art low-rank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the covtype dataset with half a million samples, MEKA takes around 70 seconds and uses less than 80 MB memory on a single machine to achieve 10% relative approximation error, while standard NystrÃ¶m approximation is about 6 times slower and uses more than 400MB memory to achieve similar approximation. We also present extensive experiments on applying MEKA to speed up kernel ridge regression.