no code implementations • NeurIPS 2023 • Changmin Yu, Neil Burgess, Maneesh Sahani, Samuel J. Gershman
Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
2 code implementations • 13 Sep 2022 • William I. Walker, Hugo Soulat, Changmin Yu, Maneesh Sahani
We introduce a new approach to probabilistic unsupervised learning based on the recognition-parametrised model (RPM): a normalised semi-parametric hypothesis class for joint distributions over observed and latent variables.
1 code implementation • 12 Sep 2022 • Changmin Yu, Hugo Soulat, Neil Burgess, Maneesh Sahani
A key goal of unsupervised learning is to go beyond density estimation and sample generation to reveal the structure inherent within observed data.
no code implementations • 30 May 2022 • Changmin Yu, David Mguni, Dong Li, Aivar Sootla, Jun Wang, Neil Burgess
Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states.
1 code implementation • ICLR 2022 • Changmin Yu, Dong Li, Jianye Hao, Jun Wang, Neil Burgess
We propose learning via retracing, a novel self-supervised approach for learning the state representation (and the associated dynamics model) for reinforcement learning tasks.
no code implementations • 27 Oct 2021 • David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang
In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy.
no code implementations • ICLR Workshop SSL-RL 2021 • Changmin Yu, Dong Li, Hangyu Mao, Jianye Hao, Neil Burgess
Representation learning is a popular approach for reinforcement learning (RL) tasks with partially observable Markov decision processes.
1 code implementation • NeurIPS 2021 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.
1 code implementation • ICLR 2021 • Changmin Yu, Timothy E. J. Behrens, Neil Burgess
Knowing how the effects of directed actions generalise to new situations (e. g. moving North, South, East and West, or turning left, right, etc.)