1 code implementation • 9 May 2024 • Xikang Yang, Xuehai Tang, Songlin Hu, Jizhong Han
CoA is a semantic-driven contextual multi-turn attack method that adaptively adjusts the attack policy through contextual feedback and semantic relevance during multi-turn of dialogue with a large model, resulting in the model producing unreasonable or harmful content.