no code implementations • 1 Mar 2024 • Michal Nauman, Mateusz Ostaszewski, Marek Cygan
VPL uses a small validation buffer to adjust the levels of pessimism throughout the agent training, with the pessimism set such that the approximation error of the critic targets is minimized.
no code implementations • 1 Mar 2024 • Michal Nauman, Michał Bortkiewicz, Mateusz Ostaszewski, Piotr Miłoś, Tomasz Trzciński, Marek Cygan
We tested these agents across 14 diverse tasks from 2 simulation benchmarks.
no code implementations • 30 Oct 2023 • Michal Nauman, Marek Cygan
Risk-aware Reinforcement Learning (RL) algorithms like SAC and TD3 were shown empirically to outperform their risk-neutral counterparts in a variety of continuous-action tasks.
1 code implementation • 24 Oct 2022 • Michal Nauman, Marek Cygan
We study the variance of stochastic policy gradients (SPGs) with many action samples per state.
1 code implementation • 29 Oct 2020 • Michal Nauman, Floris den Hengst
In WMPG, a WM is trained online and used to imagine trajectories.