1 code implementation • NeurIPS 2023 • Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark
We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory.