2 code implementations • 16 Jun 2023 • Lukas Fluri, Daniel Paleka, Florian Tramèr
If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth?