no code implementations • 5 Jun 2023 • Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko
On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model.
2 code implementations • 1 Feb 2022 • Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets
Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients.