no code implementations • 23 May 2024 • Jinxin Liu, Xinghong Guo, Zifeng Zhuang, Donglin Wang
The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data.
1 code implementation • 14 May 2024 • Zifeng Zhuang, Dengyun Peng, Jinxin Liu, Ziqi Zhang, Donglin Wang
In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models.
no code implementations • 29 Jan 2024 • Ziqi Zhang, Jingzehua Xu, Jinxin Liu, Zifeng Zhuang, Donglin Wang, Miao Liu, Shuai Zhang
Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones.
no code implementations • 12 Dec 2023 • Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Jinxin Liu, Donglin Wang, Shuai Zhang
Different from previous clipping approaches, we consider increasing the maximum cumulative Return in reinforcement learning (RL) tasks as the preference of the RL task, and propose a bi-level proximal policy optimization paradigm, which involves not only optimizing the policy but also dynamically adjusting the clipping bound to reflect the preference of the RL tasks to further elevate the training outcomes and stability of PPO.
no code implementations • 10 Nov 2023 • Hongyin Zhang, Diyuan Shi, Zifeng Zhuang, Han Zhao, Zhenyu Wei, Feng Zhao, Sibo Gai, Shangke Lyu, Donglin Wang
Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics.
no code implementations • 7 Oct 2023 • Ziqi Zhang, Xiao Xiong, Zifeng Zhuang, Jinxin Liu, Donglin Wang
Studying how to fine-tune offline reinforcement learning (RL) pre-trained policy is profoundly significant for enhancing the sample efficiency of RL algorithms.
1 code implementation • 19 Jul 2023 • Yachen Kang, Li He, Jinxin Liu, Zifeng Zhuang, Donglin Wang
Due to the existence of similarity trap, such consistency regularization improperly enhances the consistency possiblity of the model's predictions between segment pairs, and thus reduces the confidence in reward learning, since the augmented distribution does not match with the original one in PbRL.
no code implementations • NeurIPS 2023 • Jinxin Liu, Hongyin Zhang, Zifeng Zhuang, Yachen Kang, Donglin Wang, Bin Wang
Naturally, such a paradigm raises three core questions that are not fully answered by prior non-iterative offline RL counterparts like reward-conditioned policy: (q1) What information should we transfer from the inner-level to the outer-level?
1 code implementation • 22 Jun 2023 • Jinxin Liu, Ziqi Zhang, Zhenyu Wei, Zifeng Zhuang, Yachen Kang, Sibo Gai, Donglin Wang
Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data.
no code implementations • 12 Mar 2023 • Min Zhang, Zifeng Zhuang, Zhitao Wang, Donglin Wang, Wenbin Li
OOD exacerbates inconsistencies in magnitudes and directions of task gradients, which brings challenges for GBML to optimize the meta-knowledge by minimizing the sum of task gradients in each minibatch.
2 code implementations • 22 Feb 2023 • Zifeng Zhuang, Kun Lei, Jinxin Liu, Donglin Wang, Yilang Guo
Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs.