Search Results for author: Siddharth Chandak

Found 7 papers, 0 papers with code

A Concentration Bound for TD(0) with Function Approximation

no code implementations • 16 Dec 2023 • Siddharth Chandak, Vivek S. Borkar

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.

Paper
Add Code

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

no code implementations • 27 Feb 2023 • Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

We prove that UECB achieves a regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this equilibrium bandit problem where $\tau_c$ is the worst case approximate convergence time to equilibrium.

Paper
Add Code

Reinforcement Learning in Non-Markovian Environments

no code implementations • 3 Nov 2022 • Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation.

Q-Learning reinforcement-learning +1

Paper
Add Code

A Concentration Bound for LSPE($λ$)

no code implementations • 4 Nov 2021 • Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare

The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

Paper
Add Code

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

no code implementations • 27 Jun 2021 • Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.

Q-Learning reinforcement-learning +1

Paper
Add Code

Prospect-theoretic Q-learning

no code implementations • 12 Apr 2021 • Vivek S. Borkar, Siddharth Chandak

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.

Q-Learning

Paper
Add Code

Hidden Markov Model-Based Encoding for Time-Correlated IoT Sources

no code implementations • 19 Jan 2021 • Siddharth Chandak, Federico Chiariotti, Petar Popovski

As the use of Internet of Things (IoT) devices for monitoring purposes becomes ubiquitous, the efficiency of sensor communication is a major issue for the modern Internet.

Networking and Internet Architecture 94A05 (Primary), 94B35, 62M05 (Secondary) E.4; H.1.1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.