SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
Courtney Paquette , Kiwon Lee , Fabian Pedregosa , Elliot Paquette
Session: Stochastic Optimization (A)
Session Chair: Brian Bullins
Poster: Poster Session 4
Abstract:
We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and on the least squares problem with random data (finite-sum). Using this new framework, we show that the dynamics of SGD become deterministic in the large sample and dimensional limit. Furthermore, the limiting dynamics are governed by a Volterra integral equation. This model predicts that SGD undergoes a phase transition at an explicitly given critical stepsize that ultimately affects its convergence rate, which is also verified experimentally. Finally, when input data is isotropic, we provide explicit expressions for the dynamics and average-case convergence rates (i.e., the complexity of an algorithm averaged over all possible inputs). These rates show significant improvement over classical worst-case complexities.