1 code implementation • 20 May 2024 • Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret
Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.
1 code implementation • 1 Sep 2022 • Vincent Micheli, Eloi Alonso, François Fleuret
Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems.
Ranked #7 on Atari Games 100k on Atari 100k
no code implementations • 17 Feb 2022 • Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman
With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers.
1 code implementation • EMNLP 2021 • Vincent Micheli, François Fleuret
Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets.
no code implementations • 13 Apr 2021 • Vincent Micheli, Quentin Heinrich, François Fleuret, Wacim Belblidia
Attention is a key component of the now ubiquitous pre-trained language models.
no code implementations • EMNLP 2020 • Vincent Micheli, Martin d'Hoffschmidt, François Fleuret
Recent advances in language modeling have led to computationally intensive and resource-demanding state-of-the-art models.
Ranked #7 on Question Answering on FQuAD
no code implementations • 8 Feb 2020 • Vincent Micheli, Karthigan Sinnathamby, François Fleuret
We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of steps required to go from any state to another, with task-specific "aimers" that compute a target state to reach a given goal.