Search Results for author: Ema Borevkovic

Found 1 papers, 0 papers with code

Query-Based Adversarial Prompt Generation

no code implementations • 19 Feb 2024 • Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, Milad Nasr

Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.