no code implementations • 6 Mar 2024 • Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, El Mahdi El Mhamdi, Eric Moulines, Michael I. Jordan, Alain Durmus
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
no code implementations • 26 Oct 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard
In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.
no code implementations • 22 Oct 2023 • Sergey Samsonov, Daniil Tiapkin, Alexey Naumov, Eric Moulines
In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear functional approximation for policy evaluation in discounted Markov Decision Processes.
1 code implementation • 19 Oct 2023 • Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov
We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure.
no code implementations • 6 Apr 2023 • Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko
These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum.
no code implementations • 16 Mar 2023 • Sholom Schechtman, Daniil Tiapkin, Michael Muehlebach, Eric Moulines
We consider the problem of minimizing a non-convex function over a smooth manifold $\mathcal{M}$.
1 code implementation • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard
Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.
1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard
We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.
no code implementations • 16 May 2022 • Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard
We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.
no code implementations • 27 Feb 2021 • Daniil Tiapkin, Alexander Gasnikov
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs).
no code implementations • 9 Oct 2020 • Darina Dvinskikh, Daniil Tiapkin
In this paper, we focus on computational aspects of the Wasserstein barycenter problem.
Optimization and Control
no code implementations • 11 Jun 2020 • Daniil Tiapkin, Alexander Gasnikov, Pavel Dvurechensky
This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem.