no code implementations • 17 Oct 2022 • Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu
The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting.
no code implementations • 6 Jul 2013 • Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, Adam Wierman
Neuro-dynamic programming is a class of powerful techniques for approximating the solution to dynamic programming equations.