no code implementations • 30 Nov 2021 • X. Y. Han, Adrian S. Lewis
For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations.
1 code implementation • ICLR 2022 • X. Y. Han, Vardan Papyan, David L. Donoho
The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC.
1 code implementation • 18 Aug 2020 • Vardan Papyan, X. Y. Han, David L. Donoho
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero.