no code implementations • 30 Apr 2024 • Chenyu Jiang, Ye Tian, Zhen Jia, Shuai Zheng, Chuan Wu, Yida Wang
The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters.
2 code implementations • 17 Nov 2023 • Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu
This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training.
no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.