Search Results for author: Chengpeng Li

Found 4 papers, 3 papers with code

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

no code implementations • 25 Oct 2023 • Chengpeng Li, Zhengyi Yang, Jizhi Zhang, Jiancan Wu, Dingxian Wang, Xiangnan He, Xiang Wang

Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL recommenders. Worse still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender.

Contrastive Learning Offline RL +3

Paper
Add Code

Query and Response Augmentation Cannot Help Out-of-domain Math Reasoning Generalization

1 code implementation • 9 Oct 2023 • Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou

In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?

Ranked #51 on Math Word Problem Solving on MATH (using extra training data)

Arithmetic Reasoning Data Augmentation +3

167

Paper
Code

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

2 code implementations • 9 Oct 2023 • Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou

We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies.

Code Generation Instruction Following +2

532

Paper
Code

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

1 code implementation • 3 Aug 2023 • Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Keming Lu, Chuanqi Tan, Chang Zhou, Jingren Zhou

We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs.

Ranked #101 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +1

167

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.