Search Results for author: Miaosen Zhang

Found 2 papers, 2 papers with code

Transformer as Linear Expansion of Learngene

1 code implementation • 9 Dec 2023 • Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng

Under the situation where we need to produce models of varying depths adapting for different resource constraints, TLEG achieves comparable results while reducing around 19x parameters stored to initialize these models and around 5x pre-training costs, in contrast to the pre-training and fine-tuning approach.

Paper
Code

FP8-LM: Training FP8 Large Language Models

1 code implementation • 27 Oct 2023 • Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, Peng Cheng

In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs).

463

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.