Processing math: 100%



Home Page

Papers

Submissions

News

Editorial Board

Special Issues

Open Source Software

Proceedings (PMLR)

Data (DMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Stochastic Nested Variance Reduction for Nonconvex Optimization

Dongruo Zhou, Pan Xu, Quanquan Gu; 21(103):1−63, 2020.

Abstract

We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K+1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ϵ-approximate first-order stationary point within ˜O(nϵ2+ϵ3n1/2ϵ2) number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+n2/3ϵ2) and that of SCSG O(nϵ2+ϵ10/3n2/3ϵ2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms. Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG+Neon2finite algorithm achieves ˜O(n1/2ϵ2+nϵ3H+n3/4ϵ7/2H) gradient complexity to converge to an (ϵ,ϵH)-second-order stationary point, which outperforms SVRG+Neon2finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG+Neon2online achieves ˜O(ϵ3+ϵ5H+ϵ2ϵ3H) gradient complexity, which is better than both SVRG+Neon2online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.

[abs][pdf][bib]       
© JMLR 2020. (edit, beta)

Mastodon