1 code implementation • 22 Apr 2024 • Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games.
no code implementations • 5 Mar 2024 • Liangzhou Wang, Kaiwen Zhu, Fengming Zhu, Xinghu Yao, Shujie Zhang, Deheng Ye, Haobo Fu, Qiang Fu, Wei Yang
The common goal is an achievable state with high value, which is obtained by sampling from the distribution of future states.
1 code implementation • 4 Feb 2024 • Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Yang Wei, Haobo Fu
This paper presents an innovative framework that integrates Large Language Models (LLMs) with an external Thinker module to enhance the reasoning capabilities of LLM-based agents.
no code implementations • 22 Dec 2023 • Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy.
no code implementations • 2 Dec 2023 • Muyao Zhong, Shengcai Liu, Bingdong Li, Haobo Fu, Ke Tang, Peng Yang
With this advantage, this paper is able to at the first time report the results of solving 1000-dimensional TSPs by training a PtrNet on the same dimensionality, which strongly suggests that scaling up the training instances is in need to improve the performance of PtrNet on solving higher-dimensional COPs.
no code implementations • 10 Oct 2023 • Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian
DivHF learns a behavior descriptor consistent with human preference by querying human feedback.
1 code implementation • 19 Jun 2023 • Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yaodong Yang
We embed cooperative MARL problems into probabilistic graphical models, from which we derive the maximum entropy (MaxEnt) objective for MARL.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 9 Aug 2022 • Ke Xue, Yutong Wang, Cong Guan, Lei Yuan, Haobo Fu, Qiang Fu, Chao Qian, Yang Yu
Generating agents that can achieve zero-shot coordination (ZSC) with unseen partners is a new challenge in cooperative multi-agent reinforcement learning (MARL).
no code implementations • ICLR 2022 • Haobo Fu, Weiming Liu, Shuang Wu, Yijia Wang, Tao Yang, Kai Li, Junliang Xing, Bin Li, Bo Ma, Qiang Fu, Yang Wei
The deep policy gradient method has demonstrated promising results in many large-scale games, where the agent learns purely from its own experience.
1 code implementation • NeurIPS 2021 • Yifan Zang, Jinmin He, Kai Li, Lily Cao, Haobo Fu, Qiang Fu, Junliang Xing
In this paper, we propose a cooperative MARL method with sequential credit assignment (SeCA) that deduces each agent's contribution to the team's success one by one to learn better cooperation.
no code implementations • 18 Feb 2021 • Zhe Wu, Kai Li, Enmin Zhao, Hang Xu, Meng Zhang, Haobo Fu, Bo An, Junliang Xing
In this work, we propose a novel Learning to Exploit (L2E) framework for implicit opponent modeling.
5 code implementations • 10 Oct 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely.
1 code implementation • ICLR 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Yang Zheng, Lei Han, Haobo Fu, Xiangru Lian, Carson Eisenach, Haichuan Yang, Emmanuel Ekwedike, Bei Peng, Haoyue Gao, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider action spaces that are either discrete or continuous space.