Quanming Yao, James T. Kwok.
Year: 2018, Volume: 18, Issue: 179, Pages: 1−52
The use of convex regularizers allows for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, a popular subclass of $\ell_1$-based nonconvex sparsity-inducing and low-rank regularizers is considered. This includes nonconvex variants of lasso, sparse group lasso, tree- structured lasso, nuclear norm and total variation regularizers. We propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex one, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe algorithm, alternating direction method of multipliers and stochastic gradient descent). This is further extended to consider cases where the convexified regularizer does not have a closed-form proximal step, and when the loss function is nonconvex nonsmooth. Extensive experiments on a variety of machine learning application scenarios show that optimizing the transformed problem is much faster than running the state-of-the-art on the original problem.