no code implementations • 9 Sep 2020 • Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao
Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.