no code implementations • 16 Apr 2024 • Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).
1 code implementation • 29 Jun 2023 • Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu
In a large-scale cluster, the novel architecture of SRL leads to up to 3. 7x speedup compared to the design choices adopted by the existing libraries.