3 code implementations • 9 Jul 2019 • Ran Wang, Haibo Su, Chunye Wang, Kailin Ji, Jupeng Ding
In this regard, Peters et al. perform several experiments which demonstrate that it is better to adapt BERT with a light-weight task-specific head, rather than building a complex one on top of the pre-trained language model, and freeze the parameters in the said language model.