1 code implementation • 2 May 2024 • Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett
Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO.
1 code implementation • 5 Oct 2023 • Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett
Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.
1 code implementation • 1 Jun 2023 • Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett
Standard decoding approaches for conditional text generation tasks typically search for an output hypothesis with high model probability, but this may not yield the best hypothesis according to human judgments of quality.
no code implementations • 13 Oct 2022 • Prasann Singhal, Jarad Forristal, Xi Ye, Greg Durrett
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion: given a few target-domain examples and a set of models with similar training performance, can we understand how these models will perform on OOD test data?