no code implementations • 28 May 2024 • Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs).
1 code implementation • 14 Feb 2024 • Leo Schwinn, David Dobre, Sophie Xhonneux, Gauthier Gidel, Stephan Gunnemann
We address this research gap and propose the embedding space attack, which directly attacks the continuous embedding representation of input tokens.
no code implementations • 8 Feb 2024 • Sophie Xhonneux, David Dobre, Jian Tang, Gauthier Gidel, Dhanya Sridhar
Specifically, we investigate whether in-context learning (ICL) can be used to re-learn forbidden tasks despite the explicit fine-tuning of the model to refuse them.
1 code implementation • 30 Oct 2023 • Leo Schwinn, David Dobre, Stephan Günnemann, Gauthier Gidel
Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations.
no code implementations • 17 May 2023 • Thomas Altstidl, David Dobre, Björn Eskofier, Gauthier Gidel, Leo Schwinn
In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses.
no code implementations • 23 Apr 2023 • Aleksandr Beznosikov, David Dobre, Gauthier Gidel
Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems.
no code implementations • 9 Oct 2022 • Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel
By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.
1 code implementation • 2 Jun 2022 • Eduard Gorbunov, Marina Danilova, David Dobre, Pavel Dvurechensky, Alexander Gasnikov, Gauthier Gidel
In this work, we prove the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains.