1 code implementation • 24 Mar 2024 • Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall
This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.
1 code implementation • 5 Feb 2024 • Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut, Chang Ye, Zichen Liu, Lucas N. Alegre, Alexander Nikulin, Xiao Hu, Tianlin Liu, Jongwook Choi, Brent Yi
As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone.
1 code implementation • 25 Oct 2023 • Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf
Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.
Ranked #7 on Zero-Shot Learning on MedConceptsQA
1 code implementation • 29 Sep 2023 • Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontañón
Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time.
3 code implementations • 21 Jun 2022 • Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, Shuicheng Yan
EnvPool is open-sourced at https://github. com/sail-sg/envpool.
1 code implementation • 18 May 2022 • Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa
Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years.
2 code implementations • 16 Nov 2021 • Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, Jeff Braga
CleanRL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning algorithms.
4 code implementations • 21 May 2021 • Shengyi Huang, Santiago Ontañón, Chris Bamford, Lukasz Grela
In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II.
no code implementations • 12 Nov 2020 • Chris Bamford, Shengyi Huang, Simon Lucas
In recent years, there have been immense breakthroughs in Game AI research, particularly with Reinforcement Learning (RL).
2 code implementations • 5 Oct 2020 • Shengyi Huang, Santiago Ontañón
Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward.
2 code implementations • 25 Jun 2020 • Shengyi Huang, Santiago Ontañón
In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games.
3 code implementations • 26 Oct 2019 • Shengyi Huang, Santiago Ontañón
This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games.