Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing

Jiacheng Zhuo, Jeongyeol Kwon, Nhat Ho, Constantine Caramanis; 25(169):1−47, 2024.

Abstract

We consider solving the low-rank matrix sensing problem with the Factorized Gradient Descent (FGD) method when the specified rank is larger than the true rank. We refer to this as over-parameterized matrix sensing. If the ground truth signal $\mathbf{X}^* \in \mathbb{R}^{d \times d}$ is of rank $r$, but we try to recover it using $\mathbf{F} \mathbf{F}^\top$ where $\mathbf{F} \in \mathbb{R}^{d \times k}$ and $k>r$, the existing statistical analysis either no longer holds or produces a vacuous statistical error upper bound (infinity) due to the flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix $\mathbf{F}$ into separate column spaces to capture the impact of using $k > r$, we show that $\left\| {\mathbf{F}_t \mathbf{F}_t - \mathbf{X}^*} \right\|_F^2$ converges sub-linearly to a statistical error of $\tilde{\mathcal{O}} (k d \sigma^2/n)$ after $\tilde{\mathcal{O}}(\frac{\sigma_{r}}{\sigma}\sqrt{\frac{n}{d}})$ iterations, where $\mathbf{F}_t$ is the output of FGD after $t$ iterations, $\sigma^2$ is the variance of the observation noise, $\sigma_{r}$ is the $r$-th largest eigenvalue of $\mathbf{X}^*$, and $n$ is the number of samples. With a precise characterization of the convergence behavior and the statistical error, our results, therefore, offer a comprehensive picture of the statistical and computational complexity if we solve the over-parameterized matrix sensing problem with FGD.

[abs][pdf][bib]       
© JMLR 2024. (edit, beta)

Mastodon