Search Results for author: David Dobre

Found 8 papers, 3 papers with code

Learning diverse attacks on large language models for robust red-teaming and safety tuning

no code implementations • 28 May 2024 • Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain

Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs).

Language Modelling

Paper
Add Code

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

1 code implementation • 14 Feb 2024 • Leo Schwinn, David Dobre, Sophie Xhonneux, Gauthier Gidel, Stephan Gunnemann

We address this research gap and propose the embedding space attack, which directly attacks the continuous embedding representation of input tokens.

Adversarial Robustness

Paper
Code

In-Context Learning Can Re-learn Forbidden Tasks

no code implementations • 8 Feb 2024 • Sophie Xhonneux, David Dobre, Jian Tang, Gauthier Gidel, Dhanya Sridhar

Specifically, we investigate whether in-context learning (ICL) can be used to re-learn forbidden tasks despite the explicit fine-tuning of the model to refuse them.

In-Context Learning Misinformation +2

Paper
Add Code

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

1 code implementation • 30 Oct 2023 • Leo Schwinn, David Dobre, Stephan Günnemann, Gauthier Gidel

Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations.

Paper
Code

Raising the Bar for Certified Adversarial Robustness with Diffusion Models

no code implementations • 17 May 2023 • Thomas Altstidl, David Dobre, Björn Eskofier, Gauthier Gidel, Leo Schwinn

In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses.

Adversarial Robustness

Paper
Add Code

Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

no code implementations • 23 Apr 2023 • Aleksandr Beznosikov, David Dobre, Gauthier Gidel

Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems.

Paper
Add Code

Dissecting adaptive methods in GANs

no code implementations • 9 Oct 2022 • Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel

By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.

Paper
Add Code

Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

1 code implementation • 2 Jun 2022 • Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel

In this work, we prove the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.