Search Results for author: Dailin Hu

Found 4 papers, 1 papers with code

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

1 code implementation16 Sep 2022 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations6 Dec 2021 Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games Scheduling

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

no code implementations28 Nov 2021 Dailin Hu, Pieter Abbeel, Roy Fox

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.

Q-Learning reinforcement-learning +2

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations28 Oct 2021 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.