Search Results for author: Maurits Kaptein

Found 12 papers, 6 papers with code

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

no code implementations • 29 May 2024 • Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy

We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting.

Paper
Add Code

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

no code implementations • 27 Sep 2023 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting.

Efficient Exploration

Paper
Add Code

An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning

1 code implementation • 8 Sep 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

We study a posterior sampling approach to efficient exploration in constrained reinforcement learning.

Efficient Exploration reinforcement-learning +1

Paper
Code

The Impact of Batch Learning in Stochastic Linear Bandits

1 code implementation • 14 Feb 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior.

Paper
Code

The Impact of Batch Learning in Stochastic Bandits

1 code implementation • 3 Nov 2021 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

We consider a special case of bandit problems, namely batched bandits.

Recommendation Systems

Paper
Code

Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

no code implementations • 21 Aug 2019 • Jules Kruijswijk, Petri Parvinen, Maurits Kaptein

We propose and evaluate an extension of the existing method such that it can be used to evaluate CAB policies.

Decision Making

Paper
Add Code

Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models

1 code implementation • 19 Apr 2019 • Reza Mohammadi, Matthew Pratola, Maurits Kaptein

In a Bayesian framework for regression trees, Markov Chain Monte Carlo (MCMC) search algorithms are required to generate samples of tree models according to their posterior probabilities.

regression

Paper
Code

contextual: Evaluating Contextual Multi-Armed Bandit Problems in R

no code implementations • 6 Nov 2018 • Robin van Emden, Maurits Kaptein

Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance to clinical trial design and personalized medicine.

Object

Paper
Add Code

Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream

2 code implementations • 28 Feb 2018 • Maurits Kaptein, Paul Ketelaar

In marketing we are often confronted with a continuous stream of responses to marketing messages.

Marketing regression

Paper
Code

StreamingBandit; Experimenting with Bandit Policies

2 code implementations • 22 Feb 2016 • Jules Kruijswijk, Robin van Emden, Petri Parvinen, Maurits Kaptein

A large number of statistical decision problems in the social sciences and beyond can be framed as a (contextual) multi-armed bandit problem.

Human-Computer Interaction Computers and Society

Paper
Code

Lock in Feedback in Sequential Experiments

no code implementations • 2 Feb 2015 • Maurits Kaptein, Davide Iannuzzi

We often encounter situations in which an experimenter wants to find, by sequential experimentation, $x_{max} = \arg\max_{x} f(x)$, where $f(x)$ is a (possibly unknown) function of a well controllable variable $x$.

Paper
Add Code

Thompson sampling with the online bootstrap

no code implementations • 15 Oct 2014 • Dean Eckles, Maurits Kaptein

Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution.

Thompson Sampling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.