no code implementations • 25 Oct 2023 • Chengpeng Li, Zhengyi Yang, Jizhi Zhang, Jiancan Wu, Dingxian Wang, Xiangnan He, Xiang Wang
Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL recommenders. Worse still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender.
1 code implementation • 9 Oct 2023 • Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou
In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?
Ranked #51 on Math Word Problem Solving on MATH (using extra training data)
2 code implementations • 9 Oct 2023 • Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou
We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies.
1 code implementation • 3 Aug 2023 • Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Keming Lu, Chuanqi Tan, Chang Zhou, Jingren Zhou
We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs.
Ranked #101 on Arithmetic Reasoning on GSM8K (using extra training data)