Search Results for author: Zijing Fan

Found 1 papers, 0 papers with code

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

no code implementations • 6 May 2024 • Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts. This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures.

Intent Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.