no code implementations • ICLR 2022 • Xingyu Wang, Sewoong Oh, Chang-Han Rhee
The empirical success of deep learning is often attributed to SGD's mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization.