no code implementations • 13 Sep 2021 • Michalis Korakakis, Andreas Vlachos
In this paper, we conduct systematic experiments and find that scheduled sampling, while it ameliorates exposure bias by increasing model reliance on the input sequence, worsens performance when the prefix at inference time is correct, a form of catastrophic forgetting.