Search Results for author: Jeremy Bernstein

Found 17 papers, 13 papers with code

Training Neural Networks from Scratch with Parallel Low-Rank Adapters

no code implementations • 26 Feb 2024 • Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal

The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication.

Paper
Add Code

A Spectral Condition for Feature Learning

no code implementations • 26 Oct 2023 • Greg Yang, James B. Simon, Jeremy Bernstein

The push to train ever larger neural networks has motivated the study of initialization and training at large network width.

Paper
Add Code

SketchOGD: Memory-Efficient Continual Learning

1 code implementation • 25 May 2023 • Benjamin Wright, Youngjae Min, Jeremy Bernstein, Navid Azizan

This paper proposes a memory-efficient solution to catastrophic forgetting, improving upon an established algorithm known as orthogonal gradient descent (OGD).

Continual Learning

Paper
Code

Automatic Gradient Descent: Deep Learning without Hyperparameters

1 code implementation • 11 Apr 2023 • Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue

Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale.

Second-order methods

203

Paper
Code

Optimisation & Generalisation in Networks of Neurons

1 code implementation • 18 Oct 2022 • Jeremy Bernstein

On generalisation, a new correspondence is proposed between ensembles of networks and individual networks.

203

Paper
Code

Investigating Generalization by Controlling Normalized Margin

1 code implementation • 8 May 2022 • Alexander R. Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue

Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$.

Learning Theory

Paper
Code

Kernel Interpolation as a Bayes Point Machine

1 code implementation • 8 Oct 2021 • Jeremy Bernstein, Alex Farhang, Yisong Yue

A Bayes point machine is a single classifier that approximates the majority decision of an ensemble of classifiers.

Bayesian Inference

Paper
Code

On the Implicit Biases of Architecture & Gradient Descent

no code implementations • 29 Sep 2021 • Jeremy Bernstein, Yisong Yue

Do neural networks generalise because of bias in the functions returned by gradient descent, or bias already present in the network architecture?

Bayesian Inference

Paper
Add Code

Fine-Grained System Identification of Nonlinear Neural Circuits

1 code implementation • 9 Jun 2021 • Dawna Bagherian, James Gornet, Jeremy Bernstein, Yu-Li Ni, Yisong Yue, Markus Meister

We study the problem of sparse nonlinear model recovery of high dimensional compositional functions.

Paper
Code

Computing the Information Content of Trained Neural Networks

1 code implementation • 1 Mar 2021 • Jeremy Bernstein, Yisong Yue

A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored.

Paper
Code

Learning by Turning: Neural Architecture Aware Optimisation

2 code implementations • 14 Feb 2021 • Yang Liu, Jeremy Bernstein, Markus Meister, Yisong Yue

To address this problem, this paper conducts a combined study of neural architecture and optimisation, leading to a new optimiser called Nero: the neuronal rotator.

Paper
Code

Learning compositional functions via multiplicative weight updates

1 code implementation • NeurIPS 2020 • Jeremy Bernstein, Jia-Wei Zhao, Markus Meister, Ming-Yu Liu, Anima Anandkumar, Yisong Yue

This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions.

LEMMA

Paper
Code

On the distance between two neural networks and the stability of learning

2 code implementations • NeurIPS 2020 • Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.

LEMMA

203

Paper
Code

signSGD with Majority Vote is Communication Efficient And Fault Tolerant

3 code implementations • ICLR 2019 • Jeremy Bernstein, Jia-Wei Zhao, Kamyar Azizzadenesheli, Anima Anandkumar

Workers transmit only the sign of their gradient vector to a server, and the overall update is decided by a majority vote.

Benchmarking

Paper
Code

Stochastic Activation Pruning for Robust Adversarial Defense

1 code implementation • ICLR 2018 • Guneet S. Dhillon, Kamyar Azizzadenesheli, Zachary C. Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, Anima Anandkumar

Neural networks are known to be vulnerable to adversarial examples.

Adversarial Defense

Paper
Code

signSGD: Compressed Optimisation for Non-Convex Problems

3 code implementations • ICML 2018 • Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar

Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD.

Paper
Code

Convergence rate of sign stochastic gradient descent for non-convex functions

no code implementations • ICLR 2018 • Jeremy Bernstein, Kamyar Azizzadenesheli, Yu-Xiang Wang, Anima Anandkumar

The sign stochastic gradient descent method (signSGD) utilizes only the sign of the stochastic gradient in its updates.

Distributed Optimization Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.