On the Utility of Equal Batch Sizes for Inference in Stochastic Gradient Descent
Rahul Singh, Abhinek Shukla, Dootika Vats.
Year: 2025, Volume: 26, Issue: 258, Pages: 1−41
Abstract
Stochastic gradient descent (SGD) is an estimation tool for large data employed in machine learning and statistics. Due to the Markovian nature of the SGD process, inference is a challenging problem. An underlying asymptotic normality of the averaged SGD (ASGD) estimator allows for the construction of a batch-means estimator of the asymptotic covariance matrix. Instead of the usual increasing batch-size strategy, we propose a memory efficient equal batch-size strategy and show that under mild conditions, the batch-means estimator is consistent. A key feature of the proposed batching technique is that it allows for bias-correction of the variance, at no additional cost to memory. Further, since joint inference for large dimensional problems may be undesirable, we present marginal-friendly simultaneous confidence intervals, and show through an example on how covariance estimators of ASGD can be employed for improved predictions.