Search Results for author: Daria Valter

Found 1 papers, 1 papers with code

Moral Foundations of Large Language Models

1 code implementation • 23 Oct 2023 • Marwa Abdulhai, Gregory Serapio-Garcia, Clément Crepy, Daria Valter, John Canny, Natasha Jaques

Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.