1 code implementation • 9 May 2023 • Homayoon Farrahi, A. Rupam Mahmood
In this work, we investigate the widely-used baseline hyper-parameter values of two policy gradient algorithms -- PPO and SAC -- across different cycle times.
1 code implementation • 9 Mar 2021 • Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, A. Rupam Mahmood
As a key component in reinforcement learning, the reward function is usually devised carefully to guide the agent.