no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, Zhirong Wang
First, since we concern the reward of a set of recommended items, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set.
no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Rundong Wang, Xinrun Wang, Runsheng Yu, Xin Li, Zhirong Wang
Thus, the global policy of the whole page could be sub-optimal.
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 17 Sep 2018 • Jun Feng, Heng Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, Xiaoyan Zhu
The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • ICLR 2018 • Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, Hongbin Zha
Recurrent neural networks have achieved excellent performance in many applications.