no code implementations • 9 Feb 2024 • Tor Lattimore
Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation.
no code implementations • 17 May 2023 • Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li
A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system.
no code implementations • 10 Feb 2023 • Tor Lattimore, András György
We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1. 5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.
no code implementations • 7 Feb 2023 • Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen
This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level.
no code implementations • 7 Feb 2023 • Johannes Kirschner, Tor Lattimore, Andreas Krause
Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models.
no code implementations • 9 Jun 2022 • Botao Hao, Tor Lattimore
Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL).
no code implementations • 26 May 2022 • Sanae Amani, Tor Lattimore, András György, Lin F. Yang
In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.
no code implementations • 22 May 2022 • Botao Hao, Tor Lattimore, Chao Qin
Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm.
no code implementations • 22 Feb 2022 • Tor Lattimore
We show that a version of the generalised information ratio of Lattimore and Gyorgy (2020) determines the asymptotic minimax regret for all finite-action partial monitoring games provided that (a) the standard definition of regret is used but the latent space where the adversary plays is potentially infinite; or (b) the regret introduced by Rustichini (1999) is used and the latent space is finite.
no code implementations • NeurIPS 2021 • Brendan O'Donoghue, Tor Lattimore
We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy.
no code implementations • 5 Jul 2021 • Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright
We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation.
no code implementations • NeurIPS 2021 • Tor Lattimore, Botao Hao
We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector.
no code implementations • 1 Jun 2021 • Tor Lattimore
We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, \theta\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $\theta \in \mathbb R^d$ that is homogeneous over time.
no code implementations • NeurIPS 2021 • Botao Hao, Tor Lattimore, Wei Deng
Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure.
no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans
First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.
no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos
Exploration is essential for solving complex Reinforcement Learning (RL) tasks.
no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.
no code implementations • NeurIPS 2020 • Botao Hao, Tor Lattimore, Mengdi Wang
Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.
no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.
no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.
no code implementations • 25 Sep 2020 • Tor Lattimore, András György
We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].
2 code implementations • NeurIPS 2020 • David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness
We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.
no code implementations • 9 Jun 2020 • Brendan O'Donoghue, Tor Lattimore, Ian Osband
We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.
no code implementations • 31 May 2020 • Tor Lattimore
We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2. 5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions.
no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.
no code implementations • 25 Feb 2020 • Johannes Kirschner, Tor Lattimore, Andreas Krause
Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits.
no code implementations • ICML 2020 • Tor Lattimore, Csaba Szepesvari, Gellert Weisz
The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)$ requires examining essentially all actions.
no code implementations • 15 Oct 2019 • Botao Hao, Tor Lattimore, Csaba Szepesvari
Contextual bandits serve as a fundamental model for many sequential decision making tasks.
2 code implementations • 30 Sep 2019 • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter
This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs).
no code implementations • 25 Sep 2019 • Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel
We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.
2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt
bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.
no code implementations • 30 Jul 2019 • Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant
For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.
no code implementations • 12 Jul 2019 • Tor Lattimore, Csaba Szepesvari
We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound.
no code implementations • 7 Jun 2019 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore
Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.
no code implementations • NeurIPS 2019 • Julian Zimmert, Tor Lattimore
The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.
no code implementations • 19 Mar 2019 • Roman Pogodin, Tor Lattimore
Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).
no code implementations • 27 Feb 2019 • Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli
Machine learning is used extensively in recommender systems deployed in products.
no code implementations • 1 Feb 2019 • Tor Lattimore, Csaba Szepesvari
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary.
no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.
no code implementations • 8 Jan 2019 • Laurent Orseau, Tor Lattimore, Shane Legg
We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.
1 code implementation • NeurIPS 2018 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber
We introduce two novel tree search algorithms that use a policy to guide search.
no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore
Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.
no code implementations • 5 Oct 2018 • Shuai Li, Tor Lattimore, Csaba Szepesvári
We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.
no code implementations • ICML 2020 • Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner
Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.
no code implementations • 15 Jun 2018 • Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi
In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.
no code implementations • NeurIPS 2018 • Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.
no code implementations • 23 May 2018 • Tor Lattimore, Csaba Szepesvari
Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner.
no code implementations • 5 Dec 2017 • Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth
This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.
no code implementations • NeurIPS 2017 • Tor Lattimore
Existing strategies for finite-armed stochastic bandits mostly depend on a parameter of scale that must be known in advance.
1 code implementation • NeurIPS 2017 • Christoph Dann, Tor Lattimore, Emma Brunskill
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.
no code implementations • NeurIPS 2016 • Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári
The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.
no code implementations • 14 Oct 2016 • Tor Lattimore, Csaba Szepesvari
Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications.
no code implementations • 16 Aug 2016 • Tom Everitt, Tor Lattimore, Marcus Hutter
Function optimisation is a major challenge in computer science.
no code implementations • NeurIPS 2016 • Finnian Lattimore, Tor Lattimore, Mark D. Reid
We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment.
no code implementations • NeurIPS 2016 • Aurélien Garivier, Emilie Kaufmann, Tor Lattimore
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.
no code implementations • NeurIPS 2016 • Sébastien Gerchinovitz, Tor Lattimore
First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case.
no code implementations • 29 Mar 2016 • Tor Lattimore
I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise.
no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.
no code implementations • 13 Feb 2016 • Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári
We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.
no code implementations • NeurIPS 2015 • Tor Lattimore, Koby Crammer, Csaba Szepesvari
In each time step the learner chooses an allocation of several resource types between a number of tasks.
no code implementations • 18 Nov 2015 • Tor Lattimore
I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon.
no code implementations • NeurIPS 2015 • Tor Lattimore
Given a multi-armed bandit problem it may be desirable to achieve a smaller-than-usual worst-case regret for some special actions.
no code implementations • 28 Jul 2015 • Tor Lattimore
I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret.
no code implementations • NeurIPS 2014 • Tor Lattimore, Remi Munos
We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms.
no code implementations • 15 Jun 2014 • Tor Lattimore, Koby Crammer, Csaba Szepesvári
We study a sequential resource allocation problem involving a fixed number of recurring jobs.
no code implementations • 22 Aug 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag
We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.
no code implementations • 29 Jun 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag
We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.