no code implementations • 19 Sep 2023 • Tuan Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D'Eramo, Odalric-Ambrym Maillard
We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node.
no code implementations • 11 Feb 2022 • Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen
In this work, we propose two methods for improving the convergence rate and exploration based on a newly introduced backup operator and entropy regularization.
no code implementations • 1 Jul 2020 • Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen
Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making.
no code implementations • 1 Nov 2019 • Tuan Dam, Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen
Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w. r. t.