Search Results for author: Bislan Ashinov

Found 1 papers, 0 papers with code

Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge

no code implementations • 21 Apr 2024 • Narek Maloyan, Ekansh Verma, Bulat Nutfullin, Bislan Ashinov

The phenomenon of unintended triggers and the difficulty in distinguishing them from intended triggers highlights the need for further research into the robustness and interpretability of LLMs.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.