no code implementations • 21 Jan 2021 • Ci Chen, Lihua Xie, Yi Jiang, Kan Xie, Shengli Xie
To remove such a requirement, an off-policy reinforcement learning algorithm is proposed using only the measured output data along the trajectories of the system and the reference output.
Dynamical Systems