no code implementations • 14 Feb 2024 • Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell Okada, Aman Chadha, Tanya Roosta, Chirag Shah
A case study using questions from the TruthfulQA dataset demonstrates that we can generate a reliable set of probes from one LLM that can be used to audit inconsistencies in a different LLM.
no code implementations • 9 Feb 2024 • Yuta Saito, Jihan Yao, Thorsten Joachims
We also show that POTEC provides a strict generalization of policy- and regression-based approaches and their associated assumptions.