1 code implementation • 12 Sep 2023 • Maximilian Li, Xander Davies, Max Nadeau
Language models often exhibit behaviors that improve performance on a pre-training objective but harm performance on downstream tasks.
1 code implementation • 5 Apr 2023 • Moritz Reuss, Maximilian Li, Xiaogang Jia, Rudolf Lioutikov
To the best of our knowledge this is the first work that a) represents a behavior policy based on such a decoupled SDM b) learns an SDM based policy in the domain of GCIL and c) provides a way to simultaneously learn a goal-dependent and a goal-independent policy from play-data.