1 code implementation • 18 Oct 2023 • Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness
Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths.
1 code implementation • 20 Sep 2023 • Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness
BTLM-3B-8K is available under an Apache 2. 0 license on Hugging Face: https://huggingface. co/cerebras/btlm-3b-8k-base.