no code implementations • 30 May 2024 • Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, Zhaoxia Yin, Hang Su
Despite the widespread application of large language models (LLMs) across various tasks, recent studies indicate that they are susceptible to jailbreak attacks, which can render their defense mechanisms ineffective.
1 code implementation • 21 Sep 2023 • Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, Jun Zhu
By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability.
1 code implementation • 24 Sep 2022 • Zhengwei Fang, Rui Wang, Tao Huang, Liping Jing
Strong adversarial examples are crucial for evaluating and enhancing the robustness of deep neural networks.