Search Results for author: Denis Mazur

Found 4 papers, 4 papers with code

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

1 code implementation23 May 2024 Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik

In this work, we question the use of STE for extreme LLM compression, showing that it can be sub-optimal, and perform a systematic study of quantization-aware fine-tuning strategies for LLMs.

Quantization

Fast Inference of Mixture-of-Experts Language Models with Offloading

1 code implementation28 Dec 2023 Artyom Eliseev, Denis Mazur

In this work, we study the problem of running large MoE language models on consumer hardware with limited accelerator memory.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.