Processing math: 100%

Stochastic Nested Variance Reduction for Nonconvex Optimization

Dongruo Zhou, Pan Xu, Quanquan Gu.

Year: 2020, Volume: 21, Issue: 103, Pages: 1−63


Abstract

We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K+1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ϵ-approximate first-order stationary point within ˜O(nϵ2+ϵ3n1/2ϵ2) number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+n2/3ϵ2) and that of SCSG O(nϵ2+ϵ10/3n2/3ϵ2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms. Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG+Neon2finite algorithm achieves ˜O(n1/2ϵ2+nϵ3H+n3/4ϵ7/2H) gradient complexity to converge to an (ϵ,ϵH)-second-order stationary point, which outperforms SVRG+Neon2finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG+Neon2online achieves ˜O(ϵ3+ϵ5H+ϵ2ϵ3H) gradient complexity, which is better than both SVRG+Neon2online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.

PDF BibTeX