RoBERTa

Introduced by Liu et al. in RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa is an extension of BERT with changes to the pretraining procedure. The modifications include:

training the model longer, with bigger batches, over more data
removing the next sentence prediction objective
training on longer sequences
dynamically changing the masking pattern applied to the training data. The authors also collect a large new dataset ($\text{CC-News}$) of comparable size to other privately used datasets, to better control for training set size effects

Source: RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper	Code	Results	Date	Stars

Task	Papers	Share
Language Modelling	76	9.03%
Sentence	55	6.53%
Sentiment Analysis	42	4.99%
Text Classification	33	3.92%
Question Answering	33	3.92%
Classification	24	2.85%
Named Entity Recognition (NER)	19	2.26%
NER	18	2.14%
Natural Language Understanding	16	1.90%

This feature is experimental; we are continuously improving our matching algorithm.

Component	Type	Add Remove
BERT	Language Models