Search Results for author: Siddharth Chandak

Found 7 papers, 0 papers with code

A Concentration Bound for TD(0) with Function Approximation

no code implementations16 Dec 2023 Siddharth Chandak, Vivek S. Borkar

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.

Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics

no code implementations27 Feb 2023 Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

We prove that UECB achieves a regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this equilibrium bandit problem where $\tau_c$ is the worst case approximate convergence time to equilibrium.

Reinforcement Learning in Non-Markovian Environments

no code implementations3 Nov 2022 Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation.

Q-Learning reinforcement-learning +1

A Concentration Bound for LSPE($λ$)

no code implementations4 Nov 2021 Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare

The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

no code implementations27 Jun 2021 Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.

Q-Learning reinforcement-learning +1

Prospect-theoretic Q-learning

no code implementations12 Apr 2021 Vivek S. Borkar, Siddharth Chandak

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.

Q-Learning

Hidden Markov Model-Based Encoding for Time-Correlated IoT Sources

no code implementations19 Jan 2021 Siddharth Chandak, Federico Chiariotti, Petar Popovski

As the use of Internet of Things (IoT) devices for monitoring purposes becomes ubiquitous, the efficiency of sensor communication is a major issue for the modern Internet.

Networking and Internet Architecture 94A05 (Primary), 94B35, 62M05 (Secondary) E.4; H.1.1

Cannot find the paper you are looking for? You can Submit a new open access paper.