no code implementations • 21 Mar 2024 • Kimon Protopapas, Anas Barakat
In this work, we propose a new class of PMD algorithms called $h$-PMD which incorporates multi-step greedy policy improvement with lookahead depth $h$ to the PMD update rule.
1 code implementation • 27 Feb 2024 • Philip Jordan, Anas Barakat, Niao He
We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state.
1 code implementation • 8 Sep 2023 • Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He
Our main results are two-fold: (i) in the deterministic setting, we establish the first global last-iterate linear convergence result for the nested algorithm that seeks NE of zero-sum LQ games; (ii) in the model-free setting, we establish a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a single-point ZO estimator.
no code implementations • 2 Jun 2023 • Anas Barakat, Ilyas Fatkhullin, Niao He
We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure.
no code implementations • 3 Feb 2023 • Ilyas Fatkhullin, Anas Barakat, Anastasia Kireeva, Niao He
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations.
no code implementations • 14 Jun 2021 • Anas Barakat, Pascal Bianchi, Julien Lehmann
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning.
no code implementations • 18 Nov 2019 • Anas Barakat, Pascal Bianchi
In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate.
no code implementations • 25 Sep 2019 • Anas Barakat, Pascal Bianchi
In this work, we study the algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate.
no code implementations • 4 Oct 2018 • Anas Barakat, Pascal Bianchi
In the constant stepsize regime, assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point under a stability condition.