Notas detalhadas sobre roberta pires
Edit RoBERTa is an extension of BERT with changes to the pretraining procedure. The modifications include: training the model longer, with bigger batches, over more data
The original BERT uses a subword-level tokenization with the vocabulary size of 30K which is learned after input preprocessing