Search Results for author: Prasann Singhal

Found 4 papers, 3 papers with code

D2PO: Discriminator-Guided DPO with Response Evaluation Models

1 code implementation • 2 May 2024 • Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett

Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO.

Paper
Code

A Long Way to Go: Investigating Length Correlations in RLHF

1 code implementation • 5 Oct 2023 • Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett

Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.

Question Answering

Paper
Code

EEL: Efficiently Encoding Lattices for Reranking

1 code implementation • 1 Jun 2023 • Prasann Singhal, Jiacheng Xu, Xi Ye, Greg Durrett

Standard decoding approaches for conditional text generation tasks typically search for an output hypothesis with high model probability, but this may not yield the best hypothesis according to human judgments of quality.

Conditional Text Generation

Paper
Code

Assessing Out-of-Domain Language Model Performance from Few Examples

no code implementations • 13 Oct 2022 • Prasann Singhal, Jarad Forristal, Xi Ye, Greg Durrett

We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion: given a few target-domain examples and a set of models with similar training performance, can we understand how these models will perform on OOD test data?

Language Modelling Natural Language Inference

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.