Search Results for author: Chenyu Jiang

Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping

The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters.

Paper
Add Code

This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training.

Paper
Code

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.