no code implementations • 29 May 2024 • Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy
We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting.
no code implementations • 27 Sep 2023 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting.
1 code implementation • 8 Sep 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We study a posterior sampling approach to efficient exploration in constrained reinforcement learning.
1 code implementation • 14 Feb 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior.
1 code implementation • 3 Nov 2021 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We consider a special case of bandit problems, namely batched bandits.
no code implementations • 21 Aug 2019 • Jules Kruijswijk, Petri Parvinen, Maurits Kaptein
We propose and evaluate an extension of the existing method such that it can be used to evaluate CAB policies.
1 code implementation • 19 Apr 2019 • Reza Mohammadi, Matthew Pratola, Maurits Kaptein
In a Bayesian framework for regression trees, Markov Chain Monte Carlo (MCMC) search algorithms are required to generate samples of tree models according to their posterior probabilities.
no code implementations • 6 Nov 2018 • Robin van Emden, Maurits Kaptein
Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance to clinical trial design and personalized medicine.
2 code implementations • 28 Feb 2018 • Maurits Kaptein, Paul Ketelaar
In marketing we are often confronted with a continuous stream of responses to marketing messages.
2 code implementations • 22 Feb 2016 • Jules Kruijswijk, Robin van Emden, Petri Parvinen, Maurits Kaptein
A large number of statistical decision problems in the social sciences and beyond can be framed as a (contextual) multi-armed bandit problem.
Human-Computer Interaction Computers and Society
no code implementations • 2 Feb 2015 • Maurits Kaptein, Davide Iannuzzi
We often encounter situations in which an experimenter wants to find, by sequential experimentation, $x_{max} = \arg\max_{x} f(x)$, where $f(x)$ is a (possibly unknown) function of a well controllable variable $x$.
no code implementations • 15 Oct 2014 • Dean Eckles, Maurits Kaptein
Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution.