Search Results for author: Daniil Tiapkin

Found 12 papers, 3 papers with code

Incentivized Learning in Principal-Agent Bandit Games

no code implementations • 6 Mar 2024 • Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, El Mahdi El Mhamdi, Eric Moulines, Michael I. Jordan, Alain Durmus

This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.

Paper
Add Code

Demonstration-Regularized RL

no code implementations • 26 Oct 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Finite-Sample Analysis of the Temporal Difference Learning

no code implementations • 22 Oct 2023 • Sergey Samsonov, Daniil Tiapkin, Alexey Naumov, Eric Moulines

In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear functional approximation for policy evaluation in discounted Markov Decision Processes.

Paper
Add Code

Generative Flow Networks as Entropy-Regularized RL

1 code implementation • 19 Oct 2023 • Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov

We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure.

Paper
Code

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

no code implementations • 6 Apr 2023 • Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko

These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

no code implementations • 16 Mar 2023 • Sholom Schechtman, Daniil Tiapkin, Michael Muehlebach, Eric Moulines

We consider the problem of minimizing a non-convex function over a smooth manifold $\mathcal{M}$.

Paper
Add Code

Fast Rates for Maximum Entropy Exploration

1 code implementation • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Paper
Code

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

no code implementations • 16 May 2022 • Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

Multi-Armed Bandits

Paper
Add Code

Primal-Dual Stochastic Mirror Descent for MDPs

no code implementations • 27 Feb 2021 • Daniil Tiapkin, Alexander Gasnikov

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs).

Paper
Add Code

Improved Complexity Bounds in Wasserstein Barycenter Problem

no code implementations • 9 Oct 2020 • Darina Dvinskikh, Daniil Tiapkin

In this paper, we focus on computational aspects of the Wasserstein barycenter problem.

Optimization and Control

Paper
Add Code

Stochastic Saddle-Point Optimization for Wasserstein Barycenters

no code implementations • 11 Jun 2020 • Daniil Tiapkin, Alexander Gasnikov, Pavel Dvurechensky

This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem.

Stochastic Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.