1 code implementation • 6 Oct 2016 • Mehdi Khamassi, Costas Tzafestas
We apply a meta-learning algorithm based on the comparison between variations of short-term and long-term reward running averages to simultaneously tune $\beta$ and the width of the Gaussian distribution from which continuous action parameters are drawn.