no code implementations • 29 Jan 2023 • Enoch Hyunwook Kang, P. R. Kumar
In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i. e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit.