no code implementations • 15 Nov 2013 • M. M. Hassan Mahmud, Majd Hawasly, Benjamin Rosman, Subramanian Ramamoorthy
The source subset forms an `$\epsilon$-net' over the original set of MDPs, in the sense that for each previous MDP $M_p$, there is a source $M^s$ whose optimal policy has $<\epsilon$ regret in $M_p$.