no code implementations • 3 Jun 2024 • Yaniv Oren, Moritz A. Zanger, Pascal R. van der Vaart, Matthijs T. J. Spaan, Wendelin Bohmer
In this work, we propose a general extension to the AC framework that employs two separate improvement operators: one applied to the policy in the spirit of policy-based algorithms and one applied to the value in the spirit of value-based algorithms, which we dub Value-Improved AC (VI-AC).
no code implementations • 21 Oct 2022 • Yaniv Oren, Matthijs T. J. Spaan, Wendelin Böhmer
One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS).
Model-based Reinforcement Learning reinforcement-learning +1