no code implementations • NeurIPS 2020 • Giancarlo Kerg, Bhargav Kanuparthi, Anirudh Goyal Alias Parth Goyal, Kyle Goyette, Yoshua Bengio, Guillaume Lajoie
Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks.
no code implementations • 16 Jun 2020 • Giancarlo Kerg, Bhargav Kanuparthi, Anirudh Goyal, Kyle Goyette, Yoshua Bengio, Guillaume Lajoie
Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks.
1 code implementation • ICLR 2019 • Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio
This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps.