no code implementations • 26 May 2024 • Yilei Chen, Vittorio Giammarino, James Queeney, Ioannis Ch. Paschalidis
Adversarial Imitation Learning (AIL) faces challenges with sample inefficiency because of its reliance on sufficient on-policy data to evaluate the performance of the current policy during reward function updates.
no code implementations • 29 Mar 2024 • Yilei Chen, Aldo Pacchiano, Ioannis Ch. Paschalidis
Up to low order and logarithmic terms $\mathrm{CAESAR}$ achieves a sample complexity $\tilde{O}\left(\frac{H^4}{\epsilon^2}\sum_{h=1}^H\max_{k\in[K]}\sum_{s, a}\frac{(d_h^{\pi^k}(s, a))^2}{\mu^*_h(s, a)}\right)$, where $d^{\pi}$ is the visitation distribution of policy $\pi$, $\mu^*$ is the optimal sampling distribution, and $H$ is the horizon.
1 code implementation • 14 Mar 2023 • Haozhe Jiang, Kaiyue Wen, Yilei Chen
For some settings we are also able to provide theories that explain the rationale of the design of our models.