2 code implementations • 23 Feb 2024 • Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, aditi raghunathan
In this work, we address an architectural limitation of autoregressive models: token embeddings cannot contain information from tokens that appear later in the input.
1 code implementation • 18 Sep 2023 • Suhas Kotha, Jacob Mitchell Springer, aditi raghunathan
We lack a systematic understanding of the effects of fine-tuning (via methods such as instruction-tuning or reinforcement learning from human feedback), particularly on tasks outside the narrow fine-tuning distribution.