Search Results for author: Tor Lattimore

Found 67 papers, 5 papers with code

Bandit Convex Optimisation

no code implementations • 9 Feb 2024 • Tor Lattimore

Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation.

Paper
Add Code

Sequential Best-Arm Identification with Application to Brain-Computer Interface

no code implementations • 17 May 2023 • Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li

A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system.

EEG ERP +2

Paper
Add Code

A Second-Order Method for Stochastic Bandit Convex Optimisation

no code implementations • 10 Feb 2023 • Tor Lattimore, András György

We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1. 5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.

Paper
Add Code

Leveraging Demonstrations to Improve Online Learning: Quality Matters

no code implementations • 7 Feb 2023 • Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level.

Thompson Sampling

Paper
Add Code

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

no code implementations • 7 Feb 2023 • Johannes Kirschner, Tor Lattimore, Andreas Krause

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models.

Decision Making

Paper
Add Code

Regret Bounds for Information-Directed Reinforcement Learning

no code implementations • 9 Jun 2022 • Botao Hao, Tor Lattimore

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

no code implementations • 26 May 2022 • Sanae Amani, Tor Lattimore, András György, Lin F. Yang

In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.

Paper
Add Code

Contextual Information-Directed Sampling

no code implementations • 22 May 2022 • Botao Hao, Tor Lattimore, Chao Qin

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm.

Multi-Armed Bandits Reinforcement Learning (RL)

Paper
Add Code

Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

no code implementations • 22 Feb 2022 • Tor Lattimore

We show that a version of the generalised information ratio of Lattimore and Gyorgy (2020) determines the asymptotic minimax regret for all finite-action partial monitoring games provided that (a) the standard definition of regret is used but the latent space where the adversary plays is potentially infinite; or (b) the regret introduced by Rustichini (1999) is used and the latent space is finite.

Paper
Add Code

Variational Bayesian Optimistic Sampling

no code implementations • NeurIPS 2021 • Brendan O'Donoghue, Tor Lattimore

We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy.

Thompson Sampling

Paper
Add Code

Near-optimal inference in adaptive linear regression

no code implementations • 5 Jul 2021 • Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation.

Active Learning regression +2

Paper
Add Code

Bandit Phase Retrieval

no code implementations • NeurIPS 2021 • Tor Lattimore, Botao Hao

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector.

Retrieval

Paper
Add Code

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

no code implementations • 1 Jun 2021 • Tor Lattimore

We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, \theta\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $\theta \in \mathbb R^d$ that is homogeneous over time.

Paper
Add Code

Information Directed Sampling for Sparse Linear Bandits

no code implementations • NeurIPS 2021 • Botao Hao, Tor Lattimore, Wei Deng

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure.

Decision Making

Paper
Add Code

On the Optimality of Batch Policy Optimization Algorithms

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Paper
Add Code

Geometric Entropic Exploration

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

Reinforcement Learning (RL)

Paper
Add Code

Asymptotically Optimal Information-Directed Sampling

no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

Paper
Add Code

High-Dimensional Sparse Linear Bandits

no code implementations • NeurIPS 2020 • Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Vocal Bursts Intensity Prediction

Paper
Add Code

Online Sparse Reinforcement Learning

no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

Paper
Add Code

Mirror Descent and the Information Ratio

no code implementations • 25 Sep 2020 • Tor Lattimore, András György

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014].

Paper
Add Code

Gaussian Gated Linear Networks

2 code implementations • NeurIPS 2020 • David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness

We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.

Denoising Density Estimation +2

12,858

Paper
Code

Matrix games with bandit feedback

no code implementations • 9 Jun 2020 • Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Paper
Add Code

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

no code implementations • 31 May 2020 • Tor Lattimore

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2. 5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions.

Paper
Add Code

Model Selection in Contextual Stochastic Bandit Problems

no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.

Model Selection Multi-Armed Bandits

Paper
Add Code

Information Directed Sampling for Linear Partial Monitoring

no code implementations • 25 Feb 2020 • Johannes Kirschner, Tor Lattimore, Andreas Krause

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits.

Decision Making Decision Making Under Uncertainty

Paper
Add Code

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

no code implementations • ICML 2020 • Tor Lattimore, Csaba Szepesvari, Gellert Weisz

The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)$ requires examining essentially all actions.

Paper
Add Code

Adaptive Exploration in Linear Contextual Bandit

no code implementations • 15 Oct 2019 • Botao Hao, Tor Lattimore, Csaba Szepesvari

Contextual bandits serve as a fundamental model for many sequential decision making tasks.

Decision Making Multi-Armed Bandits

Paper
Add Code

Gated Linear Networks

2 code implementations • 30 Sep 2019 • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter

This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs).

Image Classification

Paper
Code

Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution

no code implementations • 25 Sep 2019 • Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel

We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Behaviour Suite for Reinforcement Learning

2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

reinforcement-learning Reinforcement Learning (RL)

1,471

Paper
Code

Iterative Budgeted Exponential Search

no code implementations • 30 Jul 2019 • Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

For graph search, A* can require $\Omega(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound.

Paper
Add Code

Exploration by Optimisation in Partial Monitoring

no code implementations • 12 Jul 2019 • Tor Lattimore, Csaba Szepesvari

We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound.

Paper
Add Code

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

no code implementations • 7 Jun 2019 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree.

Paper
Add Code

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

no code implementations • NeurIPS 2019 • Julian Zimmert, Tor Lattimore

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.

Thompson Sampling

Paper
Add Code

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

no code implementations • 19 Mar 2019 • Roman Pogodin, Tor Lattimore

Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

Paper
Add Code

Degenerate Feedback Loops in Recommender Systems

no code implementations • 27 Feb 2019 • Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli

Machine learning is used extensively in recommender systems deployed in products.

Recommendation Systems

Paper
Add Code

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

no code implementations • 1 Feb 2019 • Tor Lattimore, Csaba Szepesvari

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary.

Paper
Add Code

A Geometric Perspective on Optimal Representations for Reinforcement Learning

no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

no code implementations • 8 Jan 2019 • Laurent Orseau, Tor Lattimore, Shane Legg

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms.

Paper
Add Code

Single-Agent Policy Tree Search With Guarantees

1 code implementation • NeurIPS 2018 • Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber

We introduce two novel tree search algorithms that use a policy to guide search.

Paper
Code

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.

Multi-Armed Bandits

Paper
Add Code

Online Learning to Rank with Features

no code implementations • 5 Oct 2018 • Shuai Li, Tor Lattimore, Csaba Szepesvári

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.

Learning-To-Rank

Paper
Add Code

Linear Bandits with Stochastic Delayed Feedback

no code implementations • ICML 2020 • Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.

Marketing Multi-Armed Bandits

Paper
Add Code

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

no code implementations • 15 Jun 2018 • Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.

Learning-To-Rank Re-Ranking +1

Paper
Add Code

TopRank: A practical algorithm for online stochastic ranking

no code implementations • NeurIPS 2018 • Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.

Decision Making Learning-To-Rank +1

Paper
Add Code

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

no code implementations • 23 May 2018 • Tor Lattimore, Csaba Szepesvari

Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner.

General Classification

Paper
Add Code

Online Learning with Gated Linear Networks

no code implementations • 5 Dec 2017 • Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth

This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.

Paper
Add Code

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis

no code implementations • NeurIPS 2017 • Tor Lattimore

Existing strategies for finite-armed stochastic bandits mostly depend on a parameter of scale that must be known in advance.

Paper
Add Code

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

1 code implementation • NeurIPS 2017 • Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

no code implementations • NeurIPS 2016 • Ruitong Huang, Tor Lattimore, András György, Csaba Szepesvári

The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are convex and positively curved.

Paper
Add Code

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

no code implementations • 14 Oct 2016 • Tor Lattimore, Csaba Szepesvari

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Free Lunch for Optimisation under the Universal Distribution

no code implementations • 16 Aug 2016 • Tom Everitt, Tor Lattimore, Marcus Hutter

Function optimisation is a major challenge in computer science.

Paper
Add Code

Causal Bandits: Learning Good Interventions via Causal Inference

no code implementations • NeurIPS 2016 • Finnian Lattimore, Tor Lattimore, Mark D. Reid

We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment.

Causal Inference

Paper
Add Code

On Explore-Then-Commit Strategies

no code implementations • NeurIPS 2016 • Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards.

Paper
Add Code

Refined Lower Bounds for Adversarial Bandits

no code implementations • NeurIPS 2016 • Sébastien Gerchinovitz, Tor Lattimore

First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case.

Paper
Add Code

Regret Analysis of the Anytime Optimally Confident UCB Algorithm

no code implementations • 29 Mar 2016 • Tor Lattimore

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise.

Paper
Add Code

Thompson Sampling is Asymptotically Optimal in General Environments

no code implementations • 25 Feb 2016 • Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Conservative Bandits

no code implementations • 13 Feb 2016 • Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesvári

We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints.

Paper
Add Code

Linear Multi-Resource Allocation with Semi-Bandit Feedback

no code implementations • NeurIPS 2015 • Tor Lattimore, Koby Crammer, Csaba Szepesvari

In each time step the learner chooses an allocation of several resource types between a number of tasks.

Paper
Add Code

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

no code implementations • 18 Nov 2015 • Tor Lattimore

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

The Pareto Regret Frontier for Bandits

no code implementations • NeurIPS 2015 • Tor Lattimore

Given a multi-armed bandit problem it may be desirable to achieve a smaller-than-usual worst-case regret for some special actions.

Paper
Add Code

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

no code implementations • 28 Jul 2015 • Tor Lattimore

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret.

Paper
Add Code

Bounded Regret for Finite-Armed Structured Bandits

no code implementations • NeurIPS 2014 • Tor Lattimore, Remi Munos

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms.

Paper
Add Code

Optimal Resource Allocation with Semi-Bandit Feedback

no code implementations • 15 Jun 2014 • Tor Lattimore, Koby Crammer, Csaba Szepesvári

We study a sequential resource allocation problem involving a fixed number of recurring jobs.

Paper
Add Code

The Sample-Complexity of General Reinforcement Learning

no code implementations • 22 Aug 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag

We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.

General Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Concentration and Confidence for Discrete Bayesian Sequence Predictors

no code implementations • 29 Jun 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag

We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.