## Adaptive Minimax Regression Estimation over Sparse $\ell_q$-Hulls

*Zhan Wang, Sandra Paterlini, Fuchang Gao, Yuhong Yang*; 15(May):1675−1711, 2014.

### Abstract

Given a dictionary of $M_n$ predictors, in a random design
regression setting with $n$ observations, we construct
estimators that target the best performance among all the linear
combinations of the predictors under a sparse $\ell_q$-norm ($0
\leq q \leq 1$) constraint on the linear coefficients. Besides
identifying the optimal rates of convergence, our universal
aggregation strategies by model mixing achieve the optimal rates
simultaneously over the full range of $0\leq q \leq 1$ for any
$M_n$ and without knowledge of the $\ell_q$-norm of the best
linear coefficients to represent the regression function. To
allow model misspecification, our upper bound results are
obtained in a framework of aggregation of estimates. A striking
feature is that no specific relationship among the predictors is
needed to achieve the upper rates of convergence (hence
permitting basically arbitrary correlations between the
predictors). Therefore, whatever the true regression function
(assumed to be uniformly bounded), our estimators automatically
exploit any sparse representation of the regression function (if
any), to the best extent possible within the
$\ell_q$-constrained linear combinations for any $0 \leq q \leq
1$. A sparse approximation result in the $\ell_q$-hulls turns
out to be crucial to adaptively achieve minimax rate optimal
aggregation. It precisely characterizes the number of terms
needed to achieve a prescribed accuracy of approximation to the
best linear combination in an $\ell_q$-hull for $0 \leq q \leq
1$. It offers the insight that the minimax rate of
$\ell_q$-aggregation is basically determined by an effective
model size, which is a sparsity index that depends on $q$,
$M_n$, $n$, and the $\ell_q$-norm bound in an easily
interpretable way based on a classical model selection theory
that deals with a large number of models.

[abs][pdf][bib]