no code implementations • ICLR 2022 • DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen
However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective.
no code implementations • ICLR 2019 • catalin ionescu, tejas kulkarni, aaron van de oord, andriy mnih, Vlad Mnih
Exploration in environments with sparse rewards is a key challenge for reinforcement learning.
15 code implementations • ICLR 2018 • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.
Ranked #1 on Atari Games on Atari 2600 Surround