no code implementations • 16 Dec 2023 • Siddharth Chandak, Vivek S. Borkar
We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.
no code implementations • 27 Feb 2023 • Siddharth Chandak, Ilai Bistritz, Nicholas Bambos
We prove that UECB achieves a regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this equilibrium bandit problem where $\tau_c$ is the worst case approximate convergence time to equilibrium.
no code implementations • 3 Nov 2022 • Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia
Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation.
no code implementations • 4 Nov 2021 • Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare
The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.
no code implementations • 27 Jun 2021 • Siddharth Chandak, Vivek S. Borkar, Parth Dodhia
Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.
no code implementations • 12 Apr 2021 • Vivek S. Borkar, Siddharth Chandak
We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.
no code implementations • 19 Jan 2021 • Siddharth Chandak, Federico Chiariotti, Petar Popovski
As the use of Internet of Things (IoT) devices for monitoring purposes becomes ubiquitous, the efficiency of sensor communication is a major issue for the modern Internet.
Networking and Internet Architecture 94A05 (Primary), 94B35, 62M05 (Secondary) E.4; H.1.1