no code implementations • 15 Sep 2023 • Xuedong Shang, Igor Colin, Merwan Barlier, Hamza Cherkaoui
We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector.
1 code implementation • 1 Mar 2021 • Pierre Menard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko
We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process.
no code implementations • 15 Oct 2020 • Xuedong Shang, Han Shao, Jian Qian
We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization).
no code implementations • ICML 2020 • Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits.
no code implementations • 24 Oct 2019 • Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko
We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).