no code implementations • 15 Apr 2024 • Daniil Merkulov, Daria Cherniuk, Alexander Rudikov, Ivan Oseledets, Ekaterina Muravleva, Aleksandr Mikhalev, Boris Kashin
In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation.
no code implementations • 2 Feb 2024 • Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets
In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture.
no code implementations • 6 Dec 2023 • Daria Cherniuk, Aleksandr Mikhalev, Ivan Oseledets
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers.
no code implementations • 8 Aug 2023 • Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
no code implementations • 21 Feb 2022 • Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont
Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.