no code implementations • ICML 2020 • Claire Vernade, András György, Timothy Mann
In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.
no code implementations • 8 Feb 2024 • Nicolas Nguyen, Imad Aouali, András György, Claire Vernade
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits.
no code implementations • NeurIPS 2023 • Alexandre Marthe, Aurélien Garivier, Claire Vernade
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics.
no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.
1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh
We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.
1 code implementation • ICLR 2022 • Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel
We build on the recently proposed EigenGame that views eigendecomposition as a competitive game.
no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.
no code implementations • 20 Oct 2020 • Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori
This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma.
2 code implementations • ICLR 2021 • Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel
We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function.
1 code implementation • 18 Jun 2020 • Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári
We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.
no code implementations • ICML 2020 • Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko
Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications.
no code implementations • 3 Jun 2020 • Claire Vernade, Andras Gyorgy, Timothy Mann
In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.
no code implementations • 6 Dec 2019 • Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes
Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.
1 code implementation • NeurIPS 2019 • Yoan Russac, Claire Vernade, Olivier Cappé
To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past.
no code implementations • ICML 2020 • Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner
Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.
no code implementations • 27 Jul 2017 • Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade
This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values.
no code implementations • 28 Jun 2017 • Claire Vernade, Olivier Cappé, Vianney Perchet
We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored.
no code implementations • 5 Jun 2017 • Joon Kwon, Vianney Perchet, Claire Vernade
In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward.
no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen
The probability that a user will click a search result depends both on its relevance and its position on the results page.
no code implementations • 10 Aug 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen
The main challenge of the problem is that the individual values of the row and column are unobserved.
no code implementations • NeurIPS 2016 • Paul Lagrée, Claire Vernade, Olivier Cappé
Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting.
no code implementations • 4 Mar 2016 • Hossein Vahabi, Paul Lagrée, Claire Vernade, Olivier Cappé
In many web applications, a recommendation is not a single item suggested to a user but a list of possibly interesting contents that may be ranked in some contexts.
no code implementations • 30 Sep 2015 • Claire Vernade, Olivier Cappé
Recommending items to users is a challenging task due to the large amount of missing information.