Search Results for author: Natalia Vassilieva

Found 4 papers, 2 papers with code

SlimPajama-DC: Understanding Data Combinations for LLM Training

1 code implementation19 Sep 2023 Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing

This paper aims to understand the impacts of various data combinations (e. g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama.

Cannot find the paper you are looking for? You can Submit a new open access paper.