Search Results for author: Yury Tokpanov

Found 2 papers, 1 papers with code

Zamba: A Compact 7B SSM Hybrid Model

no code implementations • 26 May 2024 • Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge

Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay.

Paper
Add Code

BlackMamba: Mixture of Experts for State-Space Models

1 code implementation • 1 Feb 2024 • Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.

Language Modelling

201

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.