no code implementations • 4 Jan 2024 • Mincong Huang, Chao Wang, Chi Ma, Yineng Zhang, Peng Zhang, Lei Yu
Pipeline parallelism is an essential technique in the training of large-scale Transformer models.