no code implementations • 7 Mar 2024 • Fabian Otto, Philipp Becker, Vien Ang Ngo, Gerhard Neumann
Existing off-policy reinforcement learning algorithms typically necessitate an explicit state-action-value function representation, which becomes problematic in high-dimensional action spaces.