Search Results for author: Yury Tokpanov

Found 2 papers, 1 papers with code

Zamba: A Compact 7B SSM Hybrid Model

no code implementations26 May 2024 Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge

Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay.

BlackMamba: Mixture of Experts for State-Space Models

1 code implementation1 Feb 2024 Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge

In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.