no code implementations • 16 May 2024 • Joongkyu Lee, Min-hwan Oh
To the best of our knowledge, this is the first work in the MNL contextual bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
no code implementations • 8 Feb 2024 • Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh
In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate policy learning through extended actions.