Computational Bounds on Statistical Query Learning
Vitaly Feldman and Varun Kanade JMLR W&CP 23: 16.1 - 16.22, 2012
Abstract
We study the complexity of learning in Kearns' well-known statistical query (SQ) learning model
(Kearns, 1993). A number of previous works have addressed the definition and estimation of the
information-theoretic bounds on the SQ learning complexity, in other words, bounds on the query
complexity. Here we give the first strictly computational upper and lower bounds on the complexity
of several types of learning in the SQ model.
As it was already observed, the known characterization of distribution-specific SQ learning
(Blum, et al. 1994) implies that for weak learning over a fixed distribution, the query complexity
and computational complexity are essentially the same. In contrast, we show that for both
distribution-specific and distribution-independent (strong) learning there exists a concept class of
polynomial query complexity that is not efficiently learnable unless RP = NP. We then prove that
our distribution-specific lower bound is essentially tight by showing that for every concept class
C of polynomial query complexity there exists a polynomial time algorithm that given access to
random points from any distribution D and an NP oracle, can SQ learn C over D.
We also consider a restriction of the SQ model, the correlational statistical query (CSQ) model
(Bshouty and Feldman, 2001; Feldman, 2008) of learning which is closely-related to Valiant's
model of evolvability (Valiant, 2007). We show a similar separation result for distribution-independent
CSQ learning under a stronger assumption: there exists a concept class of polynomial CSQ query
complexity which is not efficiently learnable unless every problem in W[P] has a randomized fixed
parameter tractable algorithm.
