no code implementations • 23 May 2024 • Dengwang Tang, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$ and some constants that characterize the hardness of the problem instance.
no code implementations • 17 Oct 2023 • Dengwang Tang, Rahul Jain, Botao Hao, Zheng Wen
In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with.
no code implementations • 16 Oct 2023 • Dengwang Tang, Dongze Ye, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo
We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs.
1 code implementation • 10 Apr 2023 • Dengwang Tang, Ashutosh Nayyar, Rahul Jain
The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP.
no code implementations • 20 Mar 2023 • Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen
We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset.