no code implementations • 13 Aug 2023 • Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani
On top of the agent's learning, the principal trains a parallel algorithm and faces a trade-off between consistently estimating the agent's unknown rewards and maximizing their own utility by offering adaptive incentives to lead the agent.
no code implementations • 14 Apr 2023 • Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani
Motivated by a number of real-world applications from domains like healthcare and sustainable transportation, in this paper we study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework, where: the principal gives a different incentive for each bandit arm, the agent picks a bandit arm to maximize its own expected reward plus incentive, and the principal observes which arm is chosen and receives a reward (different than that of the agent) for the chosen arm.
no code implementations • 4 Aug 2021 • Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani
A significant theoretical challenge in the nonlinear setting is that there is no explicit characterization of an optimal controller for a given set of cost and system parameters.