no code implementations • 26 Jul 2017 • Yonatan Mintz, Anil Aswani, Philip Kaminsky, Elena Flowers, Yoshimi Fukuoka
Many settings involve sequential decision-making where a set of actions can be chosen at each time step, each action provides a stochastic reward, and the distribution for the reward of each action is initially unknown.