Search Results for author: Chuanxiong Guo

Found 6 papers, 1 papers with code

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

Paper
Add Code

Aryl: An Elastic Cluster Scheduler for Deep Learning

no code implementations • 16 Feb 2022 • Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang

We introduce Aryl, a new cluster scheduler to address these problems.

Management Multiple-choice +1

Paper
Add Code

Prediction of GPU Failures Under Deep Learning Workloads

no code implementations • 27 Jan 2022 • Heting Liu, Zhichao Li, Cheng Tan, Rongqiu Yang, Guohong Cao, Zherui Liu, Chuanxiong Guo

To improve the precision and stability of predictions, we propose several techniques, including parallel and cascade model-ensemble mechanisms and a sliding training method.

Paper
Add Code

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

no code implementations • 16 Dec 2021 • Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo

Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.

Graph Property Prediction Node Classification +1