no code implementations • 3 May 2024 • Ksenija Stepanovic, Wendelin Böhmer, Mathijs de Weerdt
This algorithm adapts a standard penalty-based method by dynamically updating the right-hand side of the constraints with a guardrail variable which adds a margin to prevent violations.
no code implementations • 2 Feb 2024 • Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt
In reinforcement learning (RL), different rewards can define the same optimal policy but result in drastically different learning performance.
no code implementations • 30 Jul 2023 • Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt
The key challenge to train such models is the computation of the Jacobian of the solution of the optimization problem with respect to its parameters.
no code implementations • 12 Jun 2023 • Moritz A. Zanger, Wendelin Böhmer, Matthijs T. J. Spaan
In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value.
no code implementations • 9 Jun 2023 • Max Weltevrede, Matthijs T. J. Spaan, Wendelin Böhmer
We motivate mathematically and show empirically that generalisation to tasks that are "reachable'' during training is improved by increasing the diversity of transitions in the replay buffer.
no code implementations • 6 Dec 2022 • Álvaro Serra-Gómez, Eduardo Montijano, Wendelin Böhmer, Javier Alonso-Mora
In this paper, we consider the problem where a drone has to collect semantic information to classify multiple moving targets.
no code implementations • 21 Oct 2022 • Yaniv Oren, Matthijs T. J. Spaan, Wendelin Böhmer
One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS).
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 6 Oct 2020 • Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities.
Multi-agent Reinforcement Learning reinforcement-learning +3
2 code implementations • 7 Jun 2020 • Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities.
3 code implementations • NeurIPS 2021 • Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
1 code implementation • ICLR 2020 • Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson
We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting.
2 code implementations • ICML 2020 • Wendelin Böhmer, Vitaly Kurin, Shimon Whiteson
This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning.
no code implementations • 5 Jun 2019 • Wendelin Böhmer, Tabish Rashid, Shimon Whiteson
This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning.
1 code implementation • 1 Apr 2019 • Maximilian Igl, Andrew Gambardella, Jinke He, Nantas Nardelli, N. Siddharth, Wendelin Böhmer, Shimon Whiteson
We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference.
no code implementations • 22 Dec 2016 • Wendelin Böhmer, Rong Guo, Klaus Obermayer
This paper investigates a type of instability that is linked to the greedy policy improvement in approximated reinforcement learning.
no code implementations • 19 Dec 2014 • Wendelin Böhmer, Klaus Obermayer
Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling.