no code implementations • 2 Oct 2023 • Qingfeng Lan, A. Rupam Mahmood
We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting.
1 code implementation • 29 May 2023 • Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli
One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings.
1 code implementation • 3 Feb 2023 • Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu
Reinforcement learning (RL) is essentially different from supervised learning and in practice these learned optimizers do not work well even in simple RL tasks.
1 code implementation • 22 May 2022 • Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood
The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later.
1 code implementation • 20 Dec 2021 • Qingfeng Lan
In this work, we develop a quantum reinforcement learning algorithm based on soft actor-critic -- one of the state-of-the-art methods for continuous control.
no code implementations • 29 May 2021 • Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe
Correlates of secondary information appear in LSTM representations even though they are not part of an \emph{explicitly} supervised prediction task.
1 code implementation • 9 Mar 2021 • Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, A. Rupam Mahmood
As a key component in reinforcement learning, the reward function is usually devised carefully to guide the agent.
1 code implementation • ICLR 2020 • Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White
Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.
no code implementations • 19 Dec 2019 • Zichen Zhang, Qingfeng Lan, Lei Ding, Yue Wang, Negar Hassanpour, Russell Greiner
We learn two groups of latent random variables, where one group corresponds to variables that only cause selection bias, and the other group is relevant for outcome prediction.