no code implementations • 9 Feb 2023 • Seungki Min, Daniel Russo
In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves.
no code implementations • 28 Jan 2022 • Seungki Min, Ciamac C. Moallemi, Costis Maglaras
As our problem is a special case of a linear-quadratic-Gaussian control problem with a CVaR objective, these results may be interesting in broader settings.
no code implementations • 30 Jun 2020 • Seungki Min, Ciamac C. Moallemi, Daniel J. Russo
We study the use of policy gradient algorithms to optimize over a class of generalized Thompson sampling policies.
1 code implementation • NeurIPS 2019 • Seungki Min, Costis Maglaras, Ciamac C. Moallemi
With this framework, we define an intuitive family of control policies that include Thompson sampling (TS) and the Bayesian optimal policy as endpoints.