no code implementations • ACL ARR May 2021 • Mitchell A Gordon, Kevin Duh, Jared Kaplan
We observe that the development cross-entropy loss of supervised neural machine translation models scales like a power law with the amount of training data and the number of non-embedding parameters in the model.
no code implementations • 1 Jan 2021 • Mitchell A Gordon
Approximation bounds for neural network pruning attempt to predict the trade-off between sparsity and fidelity while shrinking neural networks.