Search Results for author: Zijing Fan

Found 1 papers, 0 papers with code

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

no code implementations6 May 2024 Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts. This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures.

Intent Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.