Search Results for author: Jianxiong Li

Found 10 papers, 6 papers with code

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

no code implementations • 28 Feb 2024 • Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan

Multimodal pretraining has emerged as an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progression information; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.

Contrastive Learning Decision Making +1

Paper
Add Code

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

1 code implementation • 19 Jan 2024 • Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset.

Offline RL reinforcement-learning

Paper
Code

A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning

no code implementations • 27 Nov 2023 • Jianxiong Li, Shichao Lin, Tianyu Shi, Chujie Tian, Yu Mei, Jian Song, Xianyuan Zhan, Ruimin Li

Specifically, we combine well-established traffic flow theory with machine learning to construct a reward inference model to infer the reward signals from coarse-grained traffic data.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Query-Policy Misalignment in Preference-Based Reinforcement Learning

no code implementations • 27 May 2023 • Xiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

To unravel this mystery, we identify a long-neglected issue in the query selection schemes of existing PbRL studies: Query-Policy Misalignment.

reinforcement-learning

Paper
Add Code

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

1 code implementation • 25 May 2023 • Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance.

Computational Efficiency reinforcement-learning +1

Paper
Code

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

3 code implementations • 28 Mar 2023 • Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan

This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.

D4RL Offline RL +2

Paper
Code

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

1 code implementation • 3 Feb 2023 • Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

RGM is formulated as a bi-level optimization problem: the upper layer optimizes a reward correction term that performs visitation distribution matching w. r. t.

Reinforcement Learning (RL)

Paper
Code

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

1 code implementation • 15 Oct 2022 • Haoran Xu, Li Jiang, Jianxiong Li, Xianyuan Zhan

We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy.

D4RL Offline RL +3

Paper
Code

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

2 code implementations • 23 May 2022 • Jianxiong Li, Xianyuan Zhan, Haoran Xu, Xiangyu Zhu, Jingjing Liu, Ya-Qin Zhang

In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas.

D4RL Offline RL +2

Paper
Code

Offline Reinforcement Learning with Soft Behavior Regularization

no code implementations • 14 Oct 2021 • Haoran Xu, Xianyuan Zhan, Jianxiong Li, Honglei Yin

In this work, we start from the performance difference between the learned policy and the behavior policy, we derive a new policy learning objective that can be used in the offline setting, which corresponds to the advantage function value of the behavior policy, multiplying by a state-marginal density ratio.

Continuous Control reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.