no code implementations • ICML 2020 • Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen Mcaleer, Kagan Tumer
Training policies solely on the team-based reward is often difficult due to its sparsity.
no code implementations • 8 Oct 2020 • Hassam Sheikh, Shauharda Khadka, Santiago Miret, Somdeb Majumdar
We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
no code implementations • ICLR 2021 • Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar
For deep neural network accelerators, memory movement is both energetically expensive and can bound computation.
no code implementations • 18 Jun 2019 • Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen Mcaleer, Kagan Tumer
Training policies solely on the team-based reward is often difficult due to its sparsity.
1 code implementation • 2 May 2019 • Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer
Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks.
1 code implementation • 7 Feb 2019 • Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu, Pranav Shyam, Rupesh Kumar Srivastava, Sergey Kolesnikov, Oleksii Hrinchuk, Anton Pechenko, Mattias Ljungström, Zhen Wang, Xu Hu, Zehong Hu, Minghui Qiu, Jun Huang, Aleksei Shpilman, Ivan Sosin, Oleg Svidchenko, Aleksandra Malysheva, Daniel Kudenko, Lance Rane, Aditya Bhatt, Zhengfei Wang, Penghui Qi, Zeyang Yu, Peng Peng, Quan Yuan, Wenxin Li, Yunsheng Tian, Ruihan Yang, Pingchuan Ma, Shauharda Khadka, Somdeb Majumdar, Zach Dwiel, Yinyin Liu, Evren Tumer, Jeremy Watson, Marcel Salathé, Sergey Levine, Scott Delp
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector.
6 code implementations • NeurIPS 2018 • Shauharda Khadka, Kagan Tumer
However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters.