Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection

Abhimanyu Das; David Kempe

We introduce the submodularity ratio as a measure of how “close” to submodular a set function

$f$ is. We show that when

$f$ has submodularity ratio

$\gamma$ , the greedy algorithm for maximizing

$f$ provides a

$(1-e^{-\gamma})$ -approximation. Furthermore, when

$\gamma$ is bounded away from 0, the greedy algorithm for minimum submodular cover also provides essentially an

$O(\log n)$ approximation for a universe of

$n$ elements. As a main application of this framework, we study the problem of selecting a subset of

$k$ random variables from a large set, in order to obtain the best linear prediction of another variable of interest. We analyze the performance of widely used greedy heuristics; in particular, by showing that the submodularity ratio is lower-bounded by the smallest

$2k$ -sparse eigenvalue of the covariance matrix, we obtain the strongest known approximation guarantees for the Forward Regression and Orthogonal Matching Pursuit algorithms. As a second application, we analyze greedy algorithms for the dictionary selection problem, and significantly improve the previously known guarantees. Our theoretical analysis is complemented by experiments on real-world and synthetic data sets; in particular, we focus on an analysis of how tight various spectral parameters and the submodularity ratio are in terms of predicting the performance of the greedy algorithms.

Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection

Abstract