no code implementations • 9 Feb 2021 • X. Flora Meng, Tuhin Sarkar, Munther A. Dahleh
We prove a high-probability upper bound of $\tilde{\mathcal{O}} \big( i^*K + \sqrt{KT} \big)$ on the regret, up to polylog factors, where $i^*$ is the unknown position of the best expert, $K$ is the number of actions, and $T$ is the time horizon.