no code implementations • 23 May 2023 • Achraf Bahamou, Donald Goldfarb
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR).
no code implementations • 8 Feb 2022 • Achraf Bahamou, Donald Goldfarb, Yi Ren
Specifically, our method uses a block-diagonal approximation to the empirical Fisher matrix, where for each layer in the DNN, whether it is convolutional or feed-forward and fully connected, the associated diagonal block is itself block-diagonal and is composed of a large number of mini-blocks of modest size.
no code implementations • 9 Mar 2021 • Amine Allouah, Achraf Bahamou, Omar Besbes
For settings where the seller knows the exact probability of sale associated with one historical price or only a confidence interval for it, we fully characterize optimal performance and near-optimal pricing algorithms that adjust to the information at hand.
Computer Science and Game Theory Information Theory Information Theory
no code implementations • 12 Feb 2021 • Yi Ren, Achraf Bahamou, Donald Goldfarb
Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs.
1 code implementation • NeurIPS 2020 • Donald Goldfarb, Yi Ren, Achraf Bahamou
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).
no code implementations • ICML 2020 • Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani
We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.
no code implementations • 31 Dec 2019 • Achraf Bahamou, Donald Goldfarb
We also propose an adaptive version of ADAM that eliminates the need to tune the base learning rate and compares favorably to fine-tuned ADAM on training DNNs.