Search Results for author: Bor-Yiing Su

Found 3 papers, 0 papers with code

CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

no code implementations • 5 Nov 2020 • Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu

The paper is the first to the extent of our knowledge to perform a data-driven, in-depth analysis of applying partial recovery to recommendation models and identified a trade-off between accuracy and performance.

Paper
Add Code

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

no code implementations • 20 Mar 2020 • Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

Large-scale training is important to ensure high performance and accuracy of machine-learning models.

Distributed, Parallel, and Cluster Computing 68T05, 68M10 H.3.3; I.2.6; C.2.1

Paper
Add Code

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

no code implementations • 7 Mar 2020 • Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu, Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou

Recommendation systems are often trained with a tremendous amount of data, and distributed training is the workhorse to shorten the training time.

Click-Through Rate Prediction Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.