Search Results for author: Francis Rhys Ward

Found 4 papers, 0 papers with code

The Reasons that Agents Act: Intention and Instrumental Goals

no code implementations • 11 Feb 2024 • Francis Rhys Ward, Matt MacDermott, Francesco Belardinelli, Francesca Toni, Tom Everitt

In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents.

Philosophy

Paper
Add Code

Honesty Is the Best Policy: Defining and Mitigating AI Deception

no code implementations • NeurIPS 2023 • Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt

There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games.

Philosophy

Paper
Add Code

Experiments with Detecting and Mitigating AI Deception

no code implementations • 26 Jun 2023 • Ismail Sahbane, Francis Rhys Ward, C Henrik Åslund

How to detect and mitigate deceptive AI systems is an open problem for the field of safe and trustworthy AI.

Paper
Add Code

Argumentative Reward Learning: Reasoning About Human Preferences

no code implementations • 28 Sep 2022 • Francis Rhys Ward, Francesco Belardinelli, Francesca Toni

We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.