no code implementations • 26 Feb 2024 • Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal
The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication.
no code implementations • 26 Oct 2023 • Greg Yang, James B. Simon, Jeremy Bernstein
The push to train ever larger neural networks has motivated the study of initialization and training at large network width.
1 code implementation • 25 May 2023 • Benjamin Wright, Youngjae Min, Jeremy Bernstein, Navid Azizan
This paper proposes a memory-efficient solution to catastrophic forgetting, improving upon an established algorithm known as orthogonal gradient descent (OGD).
1 code implementation • 11 Apr 2023 • Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue
Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale.
1 code implementation • 18 Oct 2022 • Jeremy Bernstein
On generalisation, a new correspondence is proposed between ensembles of networks and individual networks.
1 code implementation • 8 May 2022 • Alexander R. Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue
Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$.
1 code implementation • 8 Oct 2021 • Jeremy Bernstein, Alex Farhang, Yisong Yue
A Bayes point machine is a single classifier that approximates the majority decision of an ensemble of classifiers.
no code implementations • 29 Sep 2021 • Jeremy Bernstein, Yisong Yue
Do neural networks generalise because of bias in the functions returned by gradient descent, or bias already present in the network architecture?
1 code implementation • 9 Jun 2021 • Dawna Bagherian, James Gornet, Jeremy Bernstein, Yu-Li Ni, Yisong Yue, Markus Meister
We study the problem of sparse nonlinear model recovery of high dimensional compositional functions.
1 code implementation • 1 Mar 2021 • Jeremy Bernstein, Yisong Yue
A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored.
2 code implementations • 14 Feb 2021 • Yang Liu, Jeremy Bernstein, Markus Meister, Yisong Yue
To address this problem, this paper conducts a combined study of neural architecture and optimisation, leading to a new optimiser called Nero: the neuronal rotator.
1 code implementation • NeurIPS 2020 • Jeremy Bernstein, Jia-Wei Zhao, Markus Meister, Ming-Yu Liu, Anima Anandkumar, Yisong Yue
This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions.
2 code implementations • NeurIPS 2020 • Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.
3 code implementations • ICLR 2019 • Jeremy Bernstein, Jia-Wei Zhao, Kamyar Azizzadenesheli, Anima Anandkumar
Workers transmit only the sign of their gradient vector to a server, and the overall update is decided by a majority vote.
1 code implementation • ICLR 2018 • Guneet S. Dhillon, Kamyar Azizzadenesheli, Zachary C. Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, Anima Anandkumar
Neural networks are known to be vulnerable to adversarial examples.
3 code implementations • ICML 2018 • Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar
Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD.
no code implementations • ICLR 2018 • Jeremy Bernstein, Kamyar Azizzadenesheli, Yu-Xiang Wang, Anima Anandkumar
The sign stochastic gradient descent method (signSGD) utilizes only the sign of the stochastic gradient in its updates.