no code implementations • NeurIPS 2023 • David X. Wu, Anant Sahai
We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together.
no code implementations • 24 Feb 2023 • David X. Wu, Chulhee Yun, Suvrit Sra
We uncover how SGD interacts with batch normalization and can exhibit undesirable training dynamics such as divergence.