no code implementations • 7 May 2024 • Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang
In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training.
no code implementations • 6 May 2024 • Hao Jin, Liangyu Zhang, Zhihua Zhang
In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals.
no code implementations • 9 Mar 2024 • Yang Peng, Liangyu Zhang, Zhihua Zhang
In the tabular case, \citet{rowland2018analysis} and \citet{rowland2023analysis} proved the asymptotic convergence of two instances of distributional TD, namely categorical temporal difference algorithm (CTD) and quantile temporal difference algorithm (QTD), respectively.
1 code implementation • 29 Sep 2023 • Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
This implies the distributional policy evaluation problem can be solved with sample efficiency.
Distributional Reinforcement Learning reinforcement-learning
1 code implementation • 29 Apr 2023 • Liangyu Zhang, Yang Peng, Wenhao Yang, Zhihua Zhang
To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems.
no code implementations • 12 Sep 2022 • Miao Lu, Wenhao Yang, Liangyu Zhang, Zhihua Zhang
Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure.
no code implementations • 9 May 2021 • Wenhao Yang, Liangyu Zhang, Zhihua Zhang
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model.
no code implementations • 1 Jan 2021 • Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang
In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.
no code implementations • 9 Aug 2020 • Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang
In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.