no code implementations • ICLR 2022 • Liu Ziyin, Botao Li, James B Simon, Masahito Ueda
Stochastic gradient descent (SGD) is widely used for the nonlinear, nonconvex problem of training deep neural networks, but its behavior remains poorly understood.
no code implementations • 29 Sep 2021 • James B Simon, Madeline Dickens, Michael Deweese
Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research.