no code implementations • ICML 2020 • Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls
We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.
no code implementations • 13 Mar 2024 • Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot
Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.
1 code implementation • 13 Feb 2024 • Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, André Barreto, Will Dabney, Marc G. Bellemare, Mark Rowland
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.
Distributional Reinforcement Learning Model-based Reinforcement Learning +1
no code implementations • 12 Feb 2024 • Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023).
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 8 Feb 2024 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney
We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms.
no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot
The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss.
1 code implementation • 9 Dec 2023 • Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland
We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot
We term this approach Nash learning from human feedback (NLHF).
1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos
In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.
no code implementations • 5 Oct 2023 • Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning.
no code implementations • 16 Jun 2023 • Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney
In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).
no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko
Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.
no code implementations • 29 May 2023 • Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko
In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function.
no code implementations • 28 May 2023 • Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney
We study the problem of temporal-difference-based policy evaluation in reinforcement learning.
Distributional Reinforcement Learning reinforcement-learning
no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 8 Dec 2022 • Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns.
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard
We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.
no code implementations • 22 Aug 2022 • Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts.
no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare
We study the multi-step off-policy learning approach to distributional RL.
Distributional Reinforcement Learning reinforcement-learning +1
1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls
It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).
no code implementations • 17 Jun 2022 • Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto
We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL.
no code implementations • 5 Jun 2022 • Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations.
no code implementations • ICLR 2022 • Clare Lyle, Mark Rowland, Will Dabney
The reinforcement learning (RL) problem is rife with sources of non-stationarity, making it a notoriously difficult problem domain for the application of neural networks.
no code implementations • 30 Mar 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko
We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.
no code implementations • 28 Jun 2021 • Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls
Importantly, $\Phi$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms.
1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko
Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.
no code implementations • 11 Jun 2021 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.
2 code implementations • NeurIPS 2021 • Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents.
no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel
These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.
no code implementations • 25 Feb 2021 • Clare Lyle, Mark Rowland, Georg Ostrovski, Will Dabney
While auxiliary tasks play a key role in shaping the representations learnt by reinforcement learning agents, much is still unknown about the mechanisms through which this is achieved.
1 code implementation • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.
2 code implementations • ICML 2020 • William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.
no code implementations • 3 Jun 2020 • Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, John Quan, Marc G. Bellemare, David Silver
To test our hypothesis empirically, we augmented a standard deep RL agent with an auxiliary task of learning the value-improvement path.
no code implementations • 4 May 2020 • Shayegan Omidshafiei, Karl Tuyls, Wojciech M. Czarnecki, Francisco C. Santos, Mark Rowland, Jerome Connor, Daniel Hennes, Paul Muller, Julien Perolat, Bart De Vylder, Audrunas Gruslys, Remi Munos
Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence.
no code implementations • 19 Feb 2020 • Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls
In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG).
no code implementations • 16 Oct 2019 • Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney
We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
no code implementations • 16 Oct 2019 • Mark Rowland, Will Dabney, Rémi Munos
A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms.
1 code implementation • ICLR 2020 • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).
1 code implementation • NeurIPS 2019 • Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos
This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents.
no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.
no code implementations • 9 Mar 2019 • Mark Rowland, Jiri Hron, Yunhao Tang, Krzysztof Choromanski, Tamas Sarlos, Adrian Weller
Wasserstein distances are increasingly used in a wide variety of applications in machine learning.
1 code implementation • 4 Mar 2019 • Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos
We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs).
no code implementations • 21 Feb 2019 • Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney
We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • NeurIPS 2018 • Mark Rowland, Krzysztof M. Choromanski, François Chalus, Aldo Pacchiano, Tamas Sarlos, Richard E. Turner, Adrian Weller
Monte Carlo sampling in high-dimensional, low-sample settings is important in many machine learning tasks.
no code implementations • 1 Jul 2018 • Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani
We also present a novel variance reduction scheme based on an antithetic variate construction between permutations to obtain an improved estimator for the Mallows kernel.
2 code implementations • ICLR 2018 • Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani
Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties.
no code implementations • ICML 2018 • Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, Adrian Weller
We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees.
no code implementations • 22 Feb 2018 • Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh
Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • NeurIPS 2017 • Mark Rowland, Adrian Weller
The idea of uprooting and rerooting graphical models was introduced specifically for binary pairwise models by Weller (2016) as a way to transform a model to any of a whole equivalence class of related models, such that inference on any one model yields inference results for all others.
17 code implementations • 27 Oct 2017 • Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos
In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.
Ranked #1 on Atari Games on Atari 2600 Pong
2 code implementations • NeurIPS 2017 • Krzysztof Choromanski, Mark Rowland, Adrian Weller
We examine a class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction and kernel approximation.
no code implementations • ICML 2017 • Nilesh Tripuraneni, Mark Rowland, Zoubin Ghahramani, Richard Turner
We establish a theoretical basis for the use of non-canonical Hamiltonian dynamics in MCMC, and construct a symplectic, leapfrog-like integrator allowing for the implementation of magnetic HMC.
3 code implementations • 10 Nov 2015 • José Miguel Hernández-Lobato, Yingzhen Li, Mark Rowland, Daniel Hernández-Lobato, Thang Bui, Richard E. Turner
Black-box alpha (BB-$\alpha$) is a new approximate inference method based on the minimization of $\alpha$-divergences.