Search Results for author: Daria Cherniuk

Quantization of Large Language Models with an Overdetermined Basis

In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation.

Paper
Add Code

In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture.

Paper
Add Code

LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers.

Paper
Add Code

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.

Paper
Add Code

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.